[ 
http://issues.apache.org/jira/browse/NUTCH-140?page=comments#action_12360643 ] 

Chris A. Mattmann commented on NUTCH-140:
-----------------------------------------

Hey Stefan,

  Mainly, it would be to make them more human readable. Also, if I go in there 
and define all the aliases for the parsing plugin extensionIds that currently 
exist, there will be little tailoring for the user to have to do out of the box 
(similar to what I did already for parse-plugins.xml and how it has most of the 
mimeTypes in the system in there already out of the box). In my opinion (and of 
course, just my opinion, so take it with a grain of salt), I think it's easier 
to look at pluginIds such as "parse-html", rather than 
"org.apache.nutch.parse.html.HtmlParser", or something like that. It's a lot 
less characters to type too, ;) Another advantage is that it wouldn't change 
the way the system currently works, i.e., there would be no direct impact on 
users who are already used to mimeType->List of pluginIds in the 
parse-plugins.xml file.

Just my two cents.

Take care!

Cheers,
  Chris

> Add alias capability in parse-plugins.xml file that allows 
> mimeType->extensionId mapping
> ----------------------------------------------------------------------------------------
>
>          Key: NUTCH-140
>          URL: http://issues.apache.org/jira/browse/NUTCH-140
>      Project: Nutch
>         Type: Improvement
>   Components: fetcher
>  Environment:  Power Mac OS X 10.4, Dual Processor G5 2.0 Ghz, 1.5 GB RAM, 
> although bug is independent of environment
>     Reporter: Chris A. Mattmann
>     Assignee: Chris A. Mattmann
>     Priority: Minor

>
>  Jerome and I have been talking about an idea to address the current issue 
> raised by Stefan G. about having a mapping of mimeType->list of pluginIds 
> rather than mimeType->list of extensionIds in the parse-plugins.xml file. 
> We've come up with the following proposed update that would seemingly fix 
> this problem.
>   We propose to have the concept of "aliases" in the parse-plugins.xml file, 
> defined at the end of the file, something lie:
>  <parse-plugins>
>     ....
>    <mimeType name="text/html">
>       <plugin id="parse-html"/>
>    </mimeType>
>     .....
>   
>    <aliases>
>    <alias name="parse-html"
> extension-point="org.apache.nutch.parse.html.HtmlParser"/>
>    ....
>    <alias name="parse-html2" extension-point="my.other.html.Parser"/>
>    
>    ....
>    </aliases>
> </parse-plugins>
> What do you guys think? This approach would be flexible enough to allow the 
> mapping of extensionIds to mimeTypes, but without impacting the current 
> "pluginId" concept.
> Comments welcome. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to