Hi,

 

I have configured a single node ES with logstash 1.4.0 (8GB memory) with 
the following configuration:

   - 
   
   index.number_of_shards: 7
   - 
   
   number_of_replicas: 0
   - 
   
   refresh_interval: -1
   - 
   
   translog.flush_threshold_ops: 100000
   - 
   
   merge.policy.merge_factor: 30
   - 
   
   codec.bloom.load: false
   - 
   
   min_shard_index_buffer_size: 12m
   - 
   
   compound_format : true
   - 
   
   indices.fielddata.cache.size: 15%
   - 
   
   indices.fielddata.cache.expire: 5m
   - 
   
   indices.cache.filter.size: 15%
   - 
   
   indices.cache.filter.expire: 5m
   
Machine : 16GB RAM, Intel I&-2600 CPU @ 3.4GHZ.
OS: 64bit WindowsServer 2012 R2

My raw data is CSV file and I use grok as a filter to parse it with output 
configuration (elasticsearch {  embedded => true flush_size => 
100000  idle_flush_time => 30 }).
Row data size is about 100GB events per day which ES tries to input into 
one index (with 7 shards).

At the beginning the insert was fast however after a while 
it's got extremely slow,  1.5K doc in 8K seconds :(

Currently the index has around 140Million docs with size of 55GB.

 

When I have analyzed the write to the disk with ProcMon I have seen that 
the process is writing in an interleaved manner to three kinds of files 
(.tim, .doc, and .pos) in 4K and 8K segments, instead of batching writes to 
some reasonable number.

 

Appreciate the help.

 

All the best,

Yitzhak

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/aa0193d2-cb6a-49e4-b4a3-a2f821f732f8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to