[
https://issues.apache.org/jira/browse/NUTCH-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193609#comment-14193609
]
Lewis John McGibbney commented on NUTCH-1644:
---------------------------------------------
[~talat], the patch you put here is kinda wild.
* The formatting for the XML is all over the place
* it includes solr4-schema.xml which is now non-existent within 2.X
* it includes references to article titles, authors and content within the
above schema as well as solr-mapping.xml
* It includes a bunch of local plugin nutch-site.xml which I am not sure fits
in with the existing plugin configuration.
* the package names are com.atlantbh.nutch where they should be
org.apache.nutch
* the Java code is not formatted correctly
* this appears to be an IndexingFilter as well...
* There seems to be an awful amount of code! Same with XML!
* It is a patch for Git, not for SVN
Thank for uploading but I feel that this needs a lot of work.
> Should have a parser that uses xpath
> ------------------------------------
>
> Key: NUTCH-1644
> URL: https://issues.apache.org/jira/browse/NUTCH-1644
> Project: Nutch
> Issue Type: New Feature
> Components: parser
> Affects Versions: 2.2.1
> Reporter: cihad güzel
> Assignee: Lewis John McGibbney
> Labels: parser, xpath
> Fix For: 2.4
>
> Attachments: NUTCH-1644.patch, filter-xpath.patch
>
>
> May want to parse some url via xpath. May be blog or news web sites. Should
> be a plugin using xpath parse.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)