[
https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445860#comment-13445860
]
Matt MacDonald commented on NUTCH-1445:
---------------------------------------
Ferdy,
Thanks for the help. I'll definitely use the mailing list first for this type
of question/issue in the future.
I was confusing 'cluster name' with 'node name' when invoking the elasticindex
command, but I am still seeing the same issue after making your suggested
change. As a result of using the proper cluster name, I can now see that the
index is added to ElasticSearch during the ElasticIndexerJob, but then it is
removed when the error is encountered:
*Index added and removed from ElasticSearch*
{noformat}
[2012-08-31 07:38:59,073][INFO ][cluster.service ] [Doorman] added
{[Atleza][KBhEZMZEQqmoSALKkYLprw][inet[/192.168.1.133:9302]]{client=true,
data=false},}, reason: zen-disco-receive(from master
[[Doppleganger][OF5TWSbpTl64qA0_VW-b_g][inet[/192.168.1.133:9300]]])
[2012-08-31 07:39:01,140][INFO ][cluster.service ] [Doorman] removed
{[Atleza][KBhEZMZEQqmoSALKkYLprw][inet[/192.168.1.133:9302]]{client=true,
data=false},}, reason: zen-disco-receive(from master
[[Doppleganger][OF5TWSbpTl64qA0_VW-b_g][inet[/192.168.1.133:9300]]])
{noformat}
*Still seeing the same error message of*
{noformat}
2012-08-31 07:38:59,990 WARN mapred.LocalJobRunner - job_local_0001
org.elasticsearch.action.ActionRequestValidationException: Validation Failed:
1: type is missing;
{noformat}
Thanks,
Matt
> Add ElasticIndexerJob that indexes to elasticsearch
> ---------------------------------------------------
>
> Key: NUTCH-1445
> URL: https://issues.apache.org/jira/browse/NUTCH-1445
> Project: Nutch
> Issue Type: New Feature
> Reporter: Ferdy Galema
> Fix For: 2.1
>
> Attachments: NUTCH-1445-addPropsToConfig.patch,
> NUTCH-1445-addToNutchScript.patch, NUTCH-1445.patch
>
>
> We have created a new indexer job ElasticIndexerJob that indexes to
> elasticsearch. It is orginally based upon
> https://github.com/ctjmorgan/nutch-elasticsearch-indexer (Apache2 license),
> but we have modified it greatly to make it integrate as good as possible into
> Nutch. The greatest modification is that documents are asynchronously flushed
> in bulk to elasticsearch.
> Elasticsearch rocks. Both performance and ease of confiugration is awesome.
> You simply deploy a server by unpacking the tar, configure the clustername,
> start the server and fire away indexing requests. Indices are automatically
> created. Fields are automapped. (Of course it is recommended to create your
> own optimized mapping, but that is beyond scope of this issue). Multiple
> servers connect without extra configuration, simply by using the same
> clustername. (By means of multicast). There a tons of advanced options, such
> as sharding, replication, disk striping etc.
> To give an example of the performance: With 20+ nodes we are able to index
> over 1M docs (average sized webdocuments) per minute. The best part is that
> the added documents are almost instantly searchable, so there no hidden
> commit costs that Solr has. This is with out-of-the-box configuration.
> (I will attach patch and commit for Nutch2. Feel free to adapt for trunk.)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira