Hi,
I strongly recommend using the latest release (es-hadoop 2.0 RC1) which
handles document rejections (which can and will happen when ES is
overloaded). Simply replace the jar and start adding more tasks until you
get the desired performance. Know that es-hadoop also records stats about
the job (docs rejected, sent, accepted, etc...) which are displayed at the
end of the job so you can use that information as well to double-check the
number of docs added.

Cheers,


On Thu, May 15, 2014 at 10:37 PM, Napoleon T. <[email protected]> wrote:

> I did not entirely solve this issue. But it looks like ES is dropping some
> requests when it's overloaded.  As my hadoop cluster can handle 42 mappers,
> I had 42 tasks trying to send write requests to only 1 ES node (I believe
> all the requests only go to one node is ES).  Most of the time, many tasks
> will fail and my hadoop job will fail. But sometimes, hadoop returns a
> success and not all the data has been successfully written.
> Reducing the number of mappers should have helped, but for some reasons
> running pig with the property -Dmapred.tasktracker.map.tasks.maximum=1 did
> not do the trick.
> Limiting the number of mappers directly in the cluster conf files seems to
> have solved the problem.
>
> On Wednesday, April 23, 2014 4:15:19 PM UTC-5, Napoleon T. wrote:
>>
>> Hi,
>>
>> I'm trying to store a lot of documents into ES using pig. The pig job
>> ends successfully but I end up with more documents in Elasticsearch than
>> the number of rows in my input.
>> My pig script is 3 lines:
>> REGISTER 'local/path/to/m2.jar'
>> data = load 'path/to/hdfs/file.tsv' as (field1: chararray, field2: long,
>> field3: long, field4: long)
>> store data into 'index/type' using org.elasticsearch.hadoop.pig.
>> EsStorage('es.nodes=node2.domain.com', 'es.rersource=index/type');
>>
>> I have speculative execution disabled for map and reduce when running
>> this pig script.
>>
>>
>> Hadoop states that 54,723,557 records were written (console output and
>> job tracker UI).
>> ES head plugin claims that I have docs: 57,344,987 (57,344,987).
>>
>> My environment:
>> hadoop: 1.2.1 with 6 nodes cluster
>> elasticsearch: 1.0.0. 6 node cluster. Different than hadoop nodes.
>> elasticsearch-hadoop version M2.
>> Pig version: 0.12.0
>>
>> Any ideas of what is going on here?
>>
>> Thanks.
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/b819e576-f0ef-41d4-854a-63bab811951a%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJogdmc5NOAmsjhVca2c%2B%3D5pCdHPPfyZPDPgagD5PWM3%3Da09Pg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to