Hi all,

I have been testing an upgrade to elasticsearch 1.4 beta1.

We use the Bulk API along with scripts to perform upserts into 
elasticsearch.  These perform well under ES 1.2 without any tuning.

However, in ES 1.4 beta1, running these upsert scripts often lead to: 
  java.lang.OutOfMemoryError: Java heap space


We use the bulk API:

  curl -iL -silent --show-error -XPOST 'localhost:9200/_bulk' --data-binary 
@./<file_name>


where the file contains about 130 Mb ( 10,000 to 250,000 lines ) of data. 
 It is filled with update / script commands:

{"update":{"_index":"2762_2014_41","_type":"event","_id":"97bc142e15c7136ebe866890e03dfad9"}}
{"doc":
  
{"type":"event","date_time":"2014-10-17T19:00:00Z","day":20141017,"impression_cost":0.005,"format":"xyz","impression":1,"referer":"xyz","browser":"xyz","os":"android
 
4.4.4","device":"nexus 
4","channel":"mobile","x_name":"xyz","id":"97bc142e15c7136ebe866890e03dfad9"
  },"doc_as_upsert":true
}


{"update":{"_index":"2762_2014_41","_type":"event","_id":"97bc142e15c7136ebe866890e03dfad9"}}
{
  "script":"if( ctx._source.containsKey(\"impression\") ){ 
ctx._source.impression += 2; } else { ctx._source.impression = 2; };"
}



There were some issues with  with permgen taking up memory in this ticket 
that have been addressed since the beta1 release, so we re-built from the 
1.4 branch:
https://github.com/elasticsearch/elasticsearch/issues/7658


And I found this discussion about an OOM error that suggested including the 
max_merged_segment in elasticsearch.yml.
https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/max_merged_segment/elasticsearch/ETjvBVUvCJs/ZccfzUIFAKoJ

  index.merge.policy.max_merged_segment: 1gb


Setting max_merged_segment, launching on my development machine with a 2gb: 
ES_HEAP_SIZE=2g ./bin/elasticsearch, and bringing down the file size 
per-bulk request to about 25Mb stablilzed the system.
However, it would still heap dump when larger files like 130Mb were allowed.


I don't fully understand how this fixed the memory issues.  Would anyone be 
able to provide some insight into why we would run into memory issues with 
the upgrade?
I'd like to better understand how the memory is managed here so that I can 
support this in production.  Are there recommended sizes for bulk requests? 
 And how those related to the max_merged_segment size?


Thanks,
Dave

 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d5845815-eb21-41c0-b899-96626dce577e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to