Hi all, I have 4 node ES running
ElasticSearch : 1.5.2 OS : RHEL 6.x Java : 1.7 CPU : 16 cores 2 machines : 60 GB RAM, 10 TB disk 2 machines : 120 GB RAM, 5 TB disk I also have a 500 node hadoop cluster and am trying to index data from Hadoop which is in Avro Format Daily size : 1.2 TB Hourly size : 40-60 GB elasticsearch.yml config ================== cluster.name: zebra index.mapping.ignore_malformed: true index.merge.scheduler.max_thread_count: 1 index.store.throttle.type: none index.refresh_interval: -1 index.translog.flush_threshold_size: 1024000000 discovery.zen.ping.unicast.hosts: ["node1","node2","node3","node4"] path.data: /hadoop01/es,/hadoop02/es,/hadoop03/es,/hadoop04/es,/hadoop05/es,/hadoop06/es,/hadoop07/es,/hadoop08/es,/hadoop09/es,/hadoop10/es,/hadoop11/es,/hadoop12/es bootstrap.mlockall: true indices.memory.index_buffer_size: 30% index.translog.flush_threshold_ops: 50000 index.store.type: mmapfs Cluster Settings ============ $ curl -XGET 'http://localhost:9200/_cluster/health?pretty=true' { "cluster_name" : "zebra", "status" : "green", "timed_out" : false, "number_of_nodes" : 4, "number_of_data_nodes" : 4, "active_primary_shards" : 21, "active_shards" : 22, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "number_of_pending_tasks" : 0 } Pig Script: ======== avro_data = LOAD '$INPUT_PATH' USING AvroStorage (); temp_projection = FOREACH avro_data GENERATE our.own.udf.ToJsonString(headers,data) as data; STORE temp_projection INTO 'fpti/raw_data' USING org.elasticsearch.hadoop.pig.EsStorage ('es.resource = fpti/raw_data','es.input.json=true', 'node1,node2,node3,node4', 'mapreduce.map.speculative=false','mapreduce.reduce.speculative=false','es.batch.size.bytes=512mb','es.batch.size.entries=1'); When i run the above, there are around 300 mappers none of them complete and every time the job fails with the below error. There is some documents that gets indexed though. *Error:* *2015-05-20 15:40:20,618 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2999: Unexpected internal error. Could not write all entries [1/8448] (maybe ES was overloaded?). Bailing out...* The job however finishes when the data size is few thousands Please let me know what else i can do to increase my indexing throughput regards #sudhir -- Please update your bookmarks! We have moved to https://discuss.elastic.co/ --- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6312a8b6-bde7-40d6-bbf0-8b3fccf7cd12%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.