Hey Kiran,

drop me a line prior to starting, I will give it a try tomorrow (I hope).

--Roland

Am 04.03.2013 14:13, schrieb kiran (JIRA):
     [ 
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13592179#comment-13592179
 ]

kiran commented on NUTCH-961:
-----------------------------

No Roland, not yet. I just switched to using 1.x series, but i will give a try 
at porting this to 2.x this week
Expose Tika's boilerpipe support
--------------------------------

                 Key: NUTCH-961
                 URL: https://issues.apache.org/jira/browse/NUTCH-961
             Project: Nutch
          Issue Type: New Feature
          Components: parser
            Reporter: Markus Jelsma
            Assignee: Markus Jelsma
             Fix For: 1.7

         Attachments: BoilerpipeExtractorRepository.java, 
NUTCH-961-1.3-3.patch, NUTCH-961-1.3-tikaparser1.patch, 
NUTCH-961-1.3-tikaparser.patch, NUTCH-961-1.4-dombuilder-1.patch, 
NUTCH-961-1.5-1.patch, NUTCH-961v2.patch


Tika 0.8 comes with the Boilerpipe content handler which can be used to extract 
boilerplate content from HTML pages. We should see how we can expose 
Boilerplate in the Nutch cofiguration.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to