Hi The error is a Grunt error which suggests Pig is throwing it not ES. What do the PIG logs say? What makes you think ES is the issue?
I know it works with smaller data but that also means Pig works with smaller data not just ES. Allan On 21 May 2015 at 01:34, Sudhir Rao <ysud...@gmail.com> wrote: > Hi all, > > I have 4 node ES running > > ElasticSearch : 1.5.2 > OS : RHEL 6.x > Java : 1.7 > CPU : 16 cores > 2 machines : 60 GB RAM, 10 TB disk > 2 machines : 120 GB RAM, 5 TB disk > > > I also have a 500 node hadoop cluster and am trying to index data from > Hadoop which is in Avro Format > > Daily size : 1.2 TB > Hourly size : 40-60 GB > > > elasticsearch.yml config > ================== > > cluster.name: zebra > index.mapping.ignore_malformed: true > index.merge.scheduler.max_thread_count: 1 > index.store.throttle.type: none > index.refresh_interval: -1 > index.translog.flush_threshold_size: 1024000000 > discovery.zen.ping.unicast.hosts: ["node1","node2","node3","node4"] > path.data: > /hadoop01/es,/hadoop02/es,/hadoop03/es,/hadoop04/es,/hadoop05/es,/hadoop06/es,/hadoop07/es,/hadoop08/es,/hadoop09/es,/hadoop10/es,/hadoop11/es,/hadoop12/es > bootstrap.mlockall: true > indices.memory.index_buffer_size: 30% > index.translog.flush_threshold_ops: 50000 > index.store.type: mmapfs > > > Cluster Settings > ============ > > $ curl -XGET 'http://localhost:9200/_cluster/health?pretty=true' > { > "cluster_name" : "zebra", > "status" : "green", > "timed_out" : false, > "number_of_nodes" : 4, > "number_of_data_nodes" : 4, > "active_primary_shards" : 21, > "active_shards" : 22, > "relocating_shards" : 0, > "initializing_shards" : 0, > "unassigned_shards" : 0, > "number_of_pending_tasks" : 0 > } > > > Pig Script: > ======== > > avro_data = LOAD '$INPUT_PATH' USING AvroStorage (); > > temp_projection = FOREACH avro_data GENERATE > our.own.udf.ToJsonString(headers,data) as data; > > STORE temp_projection INTO 'fpti/raw_data' USING > org.elasticsearch.hadoop.pig.EsStorage ('es.resource = > fpti/raw_data','es.input.json=true', 'node1,node2,node3,node4', > 'mapreduce.map.speculative=false','mapreduce.reduce.speculative=false','es.batch.size.bytes=512mb','es.batch.size.entries=1'); > When i run the above, there are around 300 mappers none of them complete > and every time the job fails with the below error. There is some documents > that gets indexed though. > > *Error:* > > *2015-05-20 15:40:20,618 [main] ERROR > org.apache.pig.tools.grunt.GruntParser - ERROR 2999: Unexpected internal > error. Could not write all entries [1/8448] (maybe ES was overloaded?). > Bailing out...* > > The job however finishes when the data size is few thousands > > > Please let me know what else i can do to increase my indexing throughput > > > regards > > #sudhir > > -- > Please update your bookmarks! We have moved to https://discuss.elastic.co/ > --- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/6312a8b6-bde7-40d6-bbf0-8b3fccf7cd12%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/6312a8b6-bde7-40d6-bbf0-8b3fccf7cd12%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- Please update your bookmarks! We have moved to https://discuss.elastic.co/ --- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAECdJzAKEZzFv4q_H9auBzAN%2B5b91XEzz1SUye-xHA1nRAcX_w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.