[ 
https://issues.apache.org/jira/browse/NUTCH-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney reopened NUTCH-1644:
-----------------------------------------
      Assignee: Lewis John McGibbney

I am reopening this issue.
The amount of people we see wanting to use XPath in the form of an easily 
configurable plugin is overwhelming.
The penalty for us not having such a plugin for Nutch is that people go and use 
Scrapy and abandon attempting to write their own ParseFilter.
We need to be aware that not having this functionality in Nutch out-of-the-box 
is damaging for our project. 
Thank you [~cguzel] for posting this patch, I am sorry I didn;t see this 
earlier.
 

> Should have a parser that uses xpath
> ------------------------------------
>
>                 Key: NUTCH-1644
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1644
>             Project: Nutch
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 2.2.1
>            Reporter: cihad güzel
>            Assignee: Lewis John McGibbney
>              Labels: parser, xpath
>             Fix For: 2.3
>
>         Attachments: NUTCH-1644.patch
>
>
> May want to parse some url via xpath. May be blog or news web sites. Should 
> be a plugin using xpath parse.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to