Hey Kiran,
drop me a line prior to starting, I will give it a try tomorrow (I hope).
--Roland
Am 04.03.2013 14:13, schrieb kiran (JIRA):
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13592179#comment-13592179
]
kiran commented on NUTCH-961:
-----------------------------
No Roland, not yet. I just switched to using 1.x series, but i will give a try
at porting this to 2.x this week
Expose Tika's boilerpipe support
--------------------------------
Key: NUTCH-961
URL: https://issues.apache.org/jira/browse/NUTCH-961
Project: Nutch
Issue Type: New Feature
Components: parser
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Fix For: 1.7
Attachments: BoilerpipeExtractorRepository.java,
NUTCH-961-1.3-3.patch, NUTCH-961-1.3-tikaparser1.patch,
NUTCH-961-1.3-tikaparser.patch, NUTCH-961-1.4-dombuilder-1.patch,
NUTCH-961-1.5-1.patch, NUTCH-961v2.patch
Tika 0.8 comes with the Boilerpipe content handler which can be used to extract
boilerplate content from HTML pages. We should see how we can expose
Boilerplate in the Nutch cofiguration.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira