[ 
https://issues.apache.org/jira/browse/NUTCH-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496986#comment-14496986
 ] 

Sebastian Nagel commented on NUTCH-1987:
----------------------------------------

Agreed: it's time to skip the Solr-URL because we support alternative indexing 
back-ends. And it's good to add a default Sorl-URL to nutch-default.xml and 
document the property this way.
Whether or not to run the indexer is an option. Instead of still relying on a 
magic positional parameter, wouldn't it be more natural to do this by 
command-line options:
{code:none}
# -i  index crawled content
# -D  <property=value>  passed to Nutch commands/tools
bin/crawl -i -D solr.server.url=http://.../solr/  urls/ crawl/ 3
# equivalent if solr.server.url is default or defined in nutch-site.xml:
bin/crawl -i urls/ crawl/ 3
# does not harm to keep this for back-ward compatibility:
bin/crawl urls/ crawl/ http://.../solr/ 3
{code}
This would make the options extensible and allows to add new ones, e.g., to 
enable/disable link inversion or webgraph creation.

> Make bin/crawl indexer agnostic
> -------------------------------
>
>                 Key: NUTCH-1987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1987
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.9
>            Reporter: Michael Joyce
>             Fix For: 1.10
>
>
> The crawl script makes it a bit challenging to use an indexer that isn't 
> Solr. For instance, when I want to use the indexer-elastic plugin I still 
> need to call the crawler script with a fake Solr URL otherwise it will skip 
> the indexing step all together.
> {code}
> bin/crawl urls/ crawl/ "http://fakeurl.com:9200"; 1
> {code}
> It would be nice to keep configuration for the Solr indexer in the conf files 
> (to mirror the elastic search indexer conf and others) and to make the 
> indexing parameter simply toggle whether indexing does or doesn't occur 
> instead of also trying to configure the indexer at the same time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to