Expose Tika's boilerpipe support
--------------------------------
Key: NUTCH-961
URL: https://issues.apache.org/jira/browse/NUTCH-961
Project: Nutch
Issue Type: New Feature
Components: parser
Reporter: Markus Jelsma
Fix For: 1.3, 2.0
Tika 0.8 comes with the Boilerpipe content handler which can be used to extract
boilerplate content from HTML pages. We should see how we can expose
Boilerplate in the Nutch cofiguration.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.