[ http://issues.apache.org/jira/browse/NUTCH-140?page=all ] Jerome Charron closed NUTCH-140: --------------------------------
Fix Version: 0.8-dev Resolution: Fixed I have committed the patch provided by Chris with some modifications: (http://svn.apache.org/viewcvs.cgi?rev=379403&view=rev) * Some minor code reformatting * An extension id can be used directly in the parse-plugin.xml file without any alias definition (will help in a transitional phase when we get a admin gui) * The API provides the ability to retrieve a parser from its extension-id or its alias (getParserByExtensionId) * Remove the deprecated methods. * Make use of the new APIs in parse-mp3 and parse-rtf Thanks Chris > Add alias capability in parse-plugins.xml file that allows > mimeType->extensionId mapping > ---------------------------------------------------------------------------------------- > > Key: NUTCH-140 > URL: http://issues.apache.org/jira/browse/NUTCH-140 > Project: Nutch > Type: Improvement > Components: fetcher > Environment: Power Mac OS X 10.4, Dual Processor G5 2.0 Ghz, 1.5 GB RAM, > although bug is independent of environment > Reporter: Chris A. Mattmann > Assignee: Chris A. Mattmann > Priority: Minor > Fix For: 0.8-dev > Attachments: NUTCH-140.20051502.patch.txt > > Jerome and I have been talking about an idea to address the current issue > raised by Stefan G. about having a mapping of mimeType->list of pluginIds > rather than mimeType->list of extensionIds in the parse-plugins.xml file. > We've come up with the following proposed update that would seemingly fix > this problem. > We propose to have the concept of "aliases" in the parse-plugins.xml file, > defined at the end of the file, something lie: > <parse-plugins> > .... > <mimeType name="text/html"> > <plugin id="parse-html"/> > </mimeType> > ..... > > <aliases> > <alias name="parse-html" > extension-point="org.apache.nutch.parse.html.HtmlParser"/> > .... > <alias name="parse-html2" extension-point="my.other.html.Parser"/> > > .... > </aliases> > </parse-plugins> > What do you guys think? This approach would be flexible enough to allow the > mapping of extensionIds to mimeTypes, but without impacting the current > "pluginId" concept. > Comments welcome. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira