[ 
https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066484#comment-13066484
 ] 

Julien Nioche commented on NUTCH-1047:
--------------------------------------

{quote}
My interest in your last point is a question which I suppose is wide open to 
discussion. What end-points (generally speaking) are we going to support and 
formally represent as pluggable entities? What criteria do we make decisions 
based on?
{quote}

We'll simply port the existing SOLR indexing to the plugin-based architecture 
so that people can easily add the backends they need. If there is a widespread 
need for a specific backend then I suppose someone will contribute patches and 
it might get committed. It's not like we need to define which backends (not 
same as endpoints BTW) would be added etc... we are just giving people the 
possibility of simply adding theirs without having to do a dirty hack of the 
indexer.

There is currently a growing interest for ElasticSearch and I know of at least 
one person who's modified the SOLR indexer to get it to work for ES. This would 
be a good candidate for inclusion, apart from that let's see what people 
contribute.



 

> Pluggable indexing backends
> ---------------------------
>
>                 Key: NUTCH-1047
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1047
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: 1.4
>            Reporter: Julien Nioche
>              Labels: indexing
>             Fix For: 1.4
>
>
> One possible feature would be to add a new endpoint for indexing-backends and 
> make the indexing plugable. at the moment we are hardwired to SOLR - which is 
> OK - but as other resources like ElasticSearch are becoming more popular it 
> would be better to handle this as plugins. Not sure about the name of the 
> endpoint though : we already have indexing-plugins (which are about 
> generating fields sent to the backends) and moreover the backends are not 
> necessarily for indexing / searching but could be just an external storage 
> e.g. CouchDB. The term backend on its own would be confusing in 2.0 as this 
> could be pertaining to the storage in GORA. 'indexing-backend' is the best 
> name that came to my mind so far - please suggest better ones.
> We should come up with generic map/reduce jobs for indexing, deduplicating 
> and cleaning and maybe add a Nutch extension point there so we can easily 
> hook up indexing, cleaning and deduplicating for various end-points.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to