[ http://issues.apache.org/jira/browse/NUTCH-140?page=all ]
Chris A. Mattmann updated NUTCH-140:
------------------------------------
Attachment: NUTCH-140.20051502.patch.txt
An initial patch for NUTCH-140 for everyone's review.
> Add alias capability in parse-plugins.xml file that allows
> mimeType->extensionId mapping
> ----------------------------------------------------------------------------------------
>
> Key: NUTCH-140
> URL: http://issues.apache.org/jira/browse/NUTCH-140
> Project: Nutch
> Type: Improvement
> Components: fetcher
> Environment: Power Mac OS X 10.4, Dual Processor G5 2.0 Ghz, 1.5 GB RAM,
> although bug is independent of environment
> Reporter: Chris A. Mattmann
> Assignee: Chris A. Mattmann
> Priority: Minor
> Attachments: NUTCH-140.20051502.patch.txt
>
> Jerome and I have been talking about an idea to address the current issue
> raised by Stefan G. about having a mapping of mimeType->list of pluginIds
> rather than mimeType->list of extensionIds in the parse-plugins.xml file.
> We've come up with the following proposed update that would seemingly fix
> this problem.
> We propose to have the concept of "aliases" in the parse-plugins.xml file,
> defined at the end of the file, something lie:
> <parse-plugins>
> ....
> <mimeType name="text/html">
> <plugin id="parse-html"/>
> </mimeType>
> .....
>
> <aliases>
> <alias name="parse-html"
> extension-point="org.apache.nutch.parse.html.HtmlParser"/>
> ....
> <alias name="parse-html2" extension-point="my.other.html.Parser"/>
>
> ....
> </aliases>
> </parse-plugins>
> What do you guys think? This approach would be flexible enough to allow the
> mapping of extensionIds to mimeTypes, but without impacting the current
> "pluginId" concept.
> Comments welcome.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira