Hi,
I have configured a single node ES with logstash 1.4.0 (8GB memory) with
the following configuration:
-
index.number_of_shards: 7
-
number_of_replicas: 0
-
refresh_interval: -1
-
translog.flush_threshold_ops: 100000
-
merge.policy.merge_factor: 30
-
codec.bloom.load: false
-
min_shard_index_buffer_size: 12m
-
compound_format : true
-
indices.fielddata.cache.size: 15%
-
indices.fielddata.cache.expire: 5m
-
indices.cache.filter.size: 15%
-
indices.cache.filter.expire: 5m
Machine : 16GB RAM, Intel I&-2600 CPU @ 3.4GHZ.
OS: 64bit WindowsServer 2012 R2
My raw data is CSV file and I use grok as a filter to parse it with output
configuration (elasticsearch { embedded => true flush_size =>
100000 idle_flush_time => 30 }).
Row data size is about 100GB events per day which ES tries to input into
one index (with 7 shards).
At the beginning the insert was fast however after a while
it's got extremely slow, 1.5K doc in 8K seconds :(
Currently the index has around 140Million docs with size of 55GB.
When I have analyzed the write to the disk with ProcMon I have seen that
the process is writing in an interleaved manner to three kinds of files
(.tim, .doc, and .pos) in 4K and 8K segments, instead of batching writes to
some reasonable number.
Appreciate the help.
All the best,
Yitzhak
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/aa0193d2-cb6a-49e4-b4a3-a2f821f732f8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.