[ http://issues.apache.org/jira/browse/NUTCH-379?page=all ]
Chris A. Mattmann updated NUTCH-379:
------------------------------------
Attachment: NUTCH-379.Mattmann.100406.patch.txt
Small patch that at least gets started on fixing the larger issue of content
urls and parser mapping, in that it forwards the content URL (as is expected
anyways by the ParserFactory I/F) to the getParsers method in the ParserFactory
> ParseUtil does not pass through the content's URL to the ParserFactory
> ----------------------------------------------------------------------
>
> Key: NUTCH-379
> URL: http://issues.apache.org/jira/browse/NUTCH-379
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 0.8, 0.9.0, 0.8.1
> Environment: Power Mac Dual G5, 2.0 Ghz, although fix is independent
> of environment
> Reporter: Chris A. Mattmann
> Assigned To: Chris A. Mattmann
> Fix For: 0.8, 0.9.0, 0.8.1, 0.8.2
>
> Attachments: NUTCH-379.Mattmann.100406.patch.txt
>
>
> Currently the ParseUtil class that is called by the Fetcher to actually
> perform the parsing of content does not forward thorugh the content's url for
> use in the ParserFactory. A bigger issue, however, is that the url (and for
> that matter, the pathSuffix) is no longer used to determine which parsing
> plugin should be called. My colleague at JPL discovered that more major bug
> and will soon input a JIRA issue for it. However, in the meantime, this small
> patch at least sets up the forwarding of the content's URL to the
> ParserFactory.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira