I did not entirely solve this issue. But it looks like ES is dropping some
requests when it's overloaded. As my hadoop cluster can handle 42 mappers,
I had 42 tasks trying to send write requests to only 1 ES node (I believe
all the requests only go to one node is ES). Most of the time, many tasks
will fail and my hadoop job will fail. But sometimes, hadoop returns a
success and not all the data has been successfully written.
Reducing the number of mappers should have helped, but for some reasons
running pig with the property -Dmapred.tasktracker.map.tasks.maximum=1 did
not do the trick.
Limiting the number of mappers directly in the cluster conf files seems to
have solved the problem.
On Wednesday, April 23, 2014 4:15:19 PM UTC-5, Napoleon T. wrote:
>
> Hi,
>
> I'm trying to store a lot of documents into ES using pig. The pig job ends
> successfully but I end up with more documents in Elasticsearch than the
> number of rows in my input.
> My pig script is 3 lines:
> REGISTER 'local/path/to/m2.jar'
> data = load 'path/to/hdfs/file.tsv' as (field1: chararray, field2: long,
> field3: long, field4: long)
> store data into 'index/type' using
> org.elasticsearch.hadoop.pig.EsStorage('es.nodes=node2.domain.com',
> 'es.rersource=index/type');
>
> I have speculative execution disabled for map and reduce when running this
> pig script.
>
>
> Hadoop states that 54,723,557 records were written (console output and
> job tracker UI).
> ES head plugin claims that I have docs: 57,344,987 (57,344,987).
>
> My environment:
> hadoop: 1.2.1 with 6 nodes cluster
> elasticsearch: 1.0.0. 6 node cluster. Different than hadoop nodes.
> elasticsearch-hadoop version M2.
> Pig version: 0.12.0
>
> Any ideas of what is going on here?
>
> Thanks.
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b819e576-f0ef-41d4-854a-63bab811951a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.