[
https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-1047:
---------------------------------
Description:
One possible feature would be to add a new endpoint for indexing-backends and
make the indexing plugable. at the moment we are hardwired to SOLR - which is
OK - but as other resources like ElasticSearch are becoming more popular it
would be better to handle this as plugins. Not sure about the name of the
endpoint though : we already have indexing-plugins (which are about generating
fields sent to the backends) and moreover the backends are not necessarily for
indexing / searching but could be just an external storage e.g. CouchDB. The
term backend on its own would be confusing in 2.0 as this could be pertaining
to the storage in GORA. 'indexing-backend' is the best name that came to my
mind so far - please suggest better ones.
We should come up with generic map/reduce jobs for indexing, deduplicating and
cleaning and maybe add a Nutch extension point there so we can easily hook up
indexing, cleaning and deduplicating for various backends.
was:
One possible feature would be to add a new endpoint for indexing-backends and
make the indexing plugable. at the moment we are hardwired to SOLR - which is
OK - but as other resources like ElasticSearch are becoming more popular it
would be better to handle this as plugins. Not sure about the name of the
endpoint though : we already have indexing-plugins (which are about generating
fields sent to the backends) and moreover the backends are not necessarily for
indexing / searching but could be just an external storage e.g. CouchDB. The
term backend on its own would be confusing in 2.0 as this could be pertaining
to the storage in GORA. 'indexing-backend' is the best name that came to my
mind so far - please suggest better ones.
We should come up with generic map/reduce jobs for indexing, deduplicating and
cleaning and maybe add a Nutch extension point there so we can easily hook up
indexing, cleaning and deduplicating for various end-points.
> Pluggable indexing backends
> ---------------------------
>
> Key: NUTCH-1047
> URL: https://issues.apache.org/jira/browse/NUTCH-1047
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: 1.4
> Reporter: Julien Nioche
> Labels: indexing
> Fix For: 1.4
>
>
> One possible feature would be to add a new endpoint for indexing-backends and
> make the indexing plugable. at the moment we are hardwired to SOLR - which is
> OK - but as other resources like ElasticSearch are becoming more popular it
> would be better to handle this as plugins. Not sure about the name of the
> endpoint though : we already have indexing-plugins (which are about
> generating fields sent to the backends) and moreover the backends are not
> necessarily for indexing / searching but could be just an external storage
> e.g. CouchDB. The term backend on its own would be confusing in 2.0 as this
> could be pertaining to the storage in GORA. 'indexing-backend' is the best
> name that came to my mind so far - please suggest better ones.
> We should come up with generic map/reduce jobs for indexing, deduplicating
> and cleaning and maybe add a Nutch extension point there so we can easily
> hook up indexing, cleaning and deduplicating for various backends.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira