Kaidul Islam created NUTCH-2389:
-----------------------------------
Summary: Precise data parsing using Jsoup CSS selectors
Key: NUTCH-2389
URL: https://issues.apache.org/jira/browse/NUTCH-2389
Project: Nutch
Issue Type: New Feature
Components: parser
Affects Versions: 2.3
Reporter: Kaidul Islam
Assignee: Kaidul Islam
Fix For: 2.4
Currently Nutch 1.x and 2.x has no features to extract/parse exact contents for
specific websites. I've developed a plugin using Jsoup for my current project
to extract precise content for site specific crawling.
Please let me know if this feature seems relevant and currently not present in
Nutch. I have also plan to export it into Nutch 1.x.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)