[ 
https://issues.apache.org/jira/browse/NUTCH-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16716596#comment-16716596
 ] 

Sebastian Nagel commented on NUTCH-2678:
----------------------------------------

Hi [~markus17], good idea to make the selection of the actual protocol 
implementation configurable per host. What about improving it?
 - having the map of hosts to protocol plugins configurable in the plugin.xml 
requires to recompile Nutch (at least, for distributed mode). Wouldn't it 
easier for users when the mapping is defined as usual in {{conf/}}? Could be a 
text file, each line {{<hostname> <tab> <plugin-name>}}. The PluginFactory gets 
the Configuration object passed in the constructor.
 - the method findExtension(...) is called for every URL, if there is no 
host-specific protocol found, even twice. It would be more efficient to cache 
the results in a map <hostname, cacheId> resp. <protocol, cacheId>.

> Allow for per-host configurable protocol plugin
> -----------------------------------------------
>
>                 Key: NUTCH-2678
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2678
>             Project: Nutch
>          Issue Type: Improvement
>          Components: protocol
>    Affects Versions: 1.15
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Major
>             Fix For: 1.16
>
>         Attachments: NUTCH-2678.patch
>
>
> Introduces new parameter for protocol plugins called host. It takes a comma 
> separated set of host names. Protocols are resolved by hostname first, then 
> by protocol as it is now.
> {code}
>    <extension id="org.apache.nutch.protocol.http"
>               name="HttpProtocol"
>               point="org.apache.nutch.protocol.Protocol">
>       <implementation id="org.apache.nutch.protocol.http.Http"
>                        class="org.apache.nutch.protocol.http.Http">
>          <parameter name="host" value="nutch.apache.org"/>
>          <parameter name="protocolName" value="http,https"/>
>       </implementation>
>    </extension>
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to