[jira] [Commented] (NUTCH-1047) Pluggable indexing backends

Julien Nioche (JIRA) Mon, 28 Jan 2013 02:59:18 -0800

    [ 
https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564196#comment-13564196
 ]


Julien Nioche commented on NUTCH-1047:
--------------------------------------

Hi Tejas

It will work everytime you set it in nutch-site.xml. As for setting it with -D 
in the crawl command - you definitely should not have to do that and this is 
where the bug is. The problem is that for some reason we value we take from the 
crawl command is correctly set in the configuration object however the later is 
reloaded or overridden during the call to JobClient.runJob(job) (IndexingJob 
line 120).

BTW the crawl command is deprecated and should be removed at some point as we 
have the crawl script. Could you try using the SOLRIndex command as well as the 
crawl script while I try and solve the problem with the crawl command?

Thanks

Julien


                
> Pluggable indexing backends
> ---------------------------
>
>                 Key: NUTCH-1047
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1047
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>              Labels: indexing
>             Fix For: 1.7
>
>         Attachments: NUTCH-1047-1.x-v1.patch, NUTCH-1047-1.x-v2.patch, 
> NUTCH-1047-1.x-v3.patch, NUTCH-1047-1.x-v4.patch
>
>
> One possible feature would be to add a new endpoint for indexing-backends and 
> make the indexing plugable. at the moment we are hardwired to SOLR - which is 
> OK - but as other resources like ElasticSearch are becoming more popular it 
> would be better to handle this as plugins. Not sure about the name of the 
> endpoint though : we already have indexing-plugins (which are about 
> generating fields sent to the backends) and moreover the backends are not 
> necessarily for indexing / searching but could be just an external storage 
> e.g. CouchDB. The term backend on its own would be confusing in 2.0 as this 
> could be pertaining to the storage in GORA. 'indexing-backend' is the best 
> name that came to my mind so far - please suggest better ones.
> We should come up with generic map/reduce jobs for indexing, deduplicating 
> and cleaning and maybe add a Nutch extension point there so we can easily 
> hook up indexing, cleaning and deduplicating for various backends.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1047) Pluggable indexing backends

Reply via email to