There are several areas of memory Elasticsearch is using when receiving
large bulks over HTTP:

- Netty buffers (HTTP chunking etc.)

- bulk source (the lines are split into portions for each primary shard)

- memory for analyzing/tokenizing the fields in the source

- translog buffer (ES write ahead logging)

- indexing buffer (Lucene NRT etc.)

The longer the bulk runs, the more competition is for the 2g heap.

If you run sustaining bulk requests for some time (say 15 - 20 minutes), ES
picks up the created segments on disk and merges the segments to larger
ones to keep the performance.

Reducing the default of 5g to 1g for max_merge_segments has two effects. It
allows for faster completion of a merge step because the volume of merge
segment is limited, and it takes off some of the pressure on the heap when
segments grow larger and larger. The downside is that merge steps are
executed more frequently.

You are correct, bulk requests around 1-10MB should work ok for most of the
servers.

Bulk requests of 100MB and larger have strong effects on the run time and
the memory consumption for the other ES processing steps which are
necessary to index the data, and should be reduced in order to find a
"sweet spot" - the exact point of the optimal balance between bulk request
input and indexing power depends also on other factors, like I/O throughput
and CPU (plus ES settings like store throttling).

Jörg



On Tue, Oct 28, 2014 at 4:41 PM, <[email protected]> wrote:

> Hi all,
>
> I have been testing an upgrade to elasticsearch 1.4 beta1.
>
> We use the Bulk API along with scripts to perform upserts into
> elasticsearch.  These perform well under ES 1.2 without any tuning.
>
> However, in ES 1.4 beta1, running these upsert scripts often lead to:
>   java.lang.OutOfMemoryError: Java heap space
>
>
> We use the bulk API:
>
>   curl -iL -silent --show-error -XPOST 'localhost:9200/_bulk'
> --data-binary @./<file_name>
>
>
> where the file contains about 130 Mb ( 10,000 to 250,000 lines ) of data.
> It is filled with update / script commands:
>
>
> {"update":{"_index":"2762_2014_41","_type":"event","_id":"97bc142e15c7136ebe866890e03dfad9"}}
> {"doc":
>
> {"type":"event","date_time":"2014-10-17T19:00:00Z","day":20141017,"impression_cost":0.005,"format":"xyz","impression":1,"referer":"xyz","browser":"xyz","os":"android
> 4.4.4","device":"nexus
> 4","channel":"mobile","x_name":"xyz","id":"97bc142e15c7136ebe866890e03dfad9"
>   },"doc_as_upsert":true
> }
>
>
>
> {"update":{"_index":"2762_2014_41","_type":"event","_id":"97bc142e15c7136ebe866890e03dfad9"}}
> {
>   "script":"if( ctx._source.containsKey(\"impression\") ){
> ctx._source.impression += 2; } else { ctx._source.impression = 2; };"
> }
>
>
>
> There were some issues with  with permgen taking up memory in this ticket
> that have been addressed since the beta1 release, so we re-built from the
> 1.4 branch:
> https://github.com/elasticsearch/elasticsearch/issues/7658
>
>
> And I found this discussion about an OOM error that suggested including
> the max_merged_segment in elasticsearch.yml.
>
> https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/max_merged_segment/elasticsearch/ETjvBVUvCJs/ZccfzUIFAKoJ
>
>   index.merge.policy.max_merged_segment: 1gb
>
>
> Setting max_merged_segment, launching on my development machine with a
> 2gb: ES_HEAP_SIZE=2g ./bin/elasticsearch, and bringing down the file size
> per-bulk request to about 25Mb stablilzed the system.
> However, it would still heap dump when larger files like 130Mb were
> allowed.
>
>
> I don't fully understand how this fixed the memory issues.  Would anyone
> be able to provide some insight into why we would run into memory issues
> with the upgrade?
> I'd like to better understand how the memory is managed here so that I can
> support this in production.  Are there recommended sizes for bulk
> requests?  And how those related to the max_merged_segment size?
>
>
> Thanks,
> Dave
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/d5845815-eb21-41c0-b899-96626dce577e%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/d5845815-eb21-41c0-b899-96626dce577e%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGqFR_QSBMKiynb%2BpbLKh-VvEoGzj8iJiHv5VL41QKZDA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to