[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-961:
--------------------------------
Fix Version/s: (was: 1.3)
Tika 0.8 has some issues with PDF parsing, it would be better to use the next
release instead. This won't be done as part of the 1.3 release as this is a new
functionality and not a bugfix
> Expose Tika's boilerpipe support
> --------------------------------
>
> Key: NUTCH-961
> URL: https://issues.apache.org/jira/browse/NUTCH-961
> Project: Nutch
> Issue Type: New Feature
> Components: parser
> Reporter: Markus Jelsma
> Fix For: 2.0
>
>
> Tika 0.8 comes with the Boilerpipe content handler which can be used to
> extract boilerplate content from HTML pages. We should see how we can expose
> Boilerplate in the Nutch cofiguration.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.