Answers inline. Regarding the slow I/O. When I analyzed the creation of the Lucene index files I see that they are created without any special flags (such as no buffering or write through). This means that we’re paying costs twice – when we write the file we’re going cache data in Windows’ Cache Manager, which takes a lot of memory (which is then not available to the application itself) but when we read the file we don’t actually read it using the cache, which makes the operation slow. *Any ideas?*
On Wednesday, April 9, 2014 5:28:11 PM UTC+3, Itamar Syn-Hershko wrote: > Shooting in the dark here, but here it goes: > > 1. Do you have anything else running on the system? for example AVs are > known to cause slow-downs for such services, and other I/O or memory heavy > services could cause thrashing or just general slowdown > No, nothing else is running on that machine. Initially it was working fast it got slower with that amount of data that in index. Moreover is there a way to increase buffer size for the Lucene index files (.tim, .doc, and .pos) from 8K to something much bigger. > > 2. What JVM version are you running this with? > java version "1.7.0_51" Java(TM) SE Runtime Environment (build 1.7.0_51-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode) OS_NAME="Windows" OS_VERSION="5.2" OS_ARCH="amd64" 3. If you changed any of the default settings for merge factors etc - can > you revert that and try again? > Tried before was same behavior. > > 4. Can you try with embedded=false and see if it makes a difference? > Tried before was same behavior. > > -- > > Itamar Syn-Hershko > http://code972.com | @synhershko <https://twitter.com/synhershko> > Freelance Developer & Consultant > Author of RavenDB in Action <http://manning.com/synhershko/> > > > On Wed, Apr 9, 2014 at 4:11 PM, Yitzhak Kesselman > <[email protected]<javascript:> > > wrote: > >> Hi, >> >> >> >> I have configured a single node ES with logstash 1.4.0 (8GB memory) with >> the following configuration: >> >> - >> >> index.number_of_shards: 7 >> - >> >> number_of_replicas: 0 >> - >> >> refresh_interval: -1 >> - >> >> translog.flush_threshold_ops: 100000 >> - >> >> merge.policy.merge_factor: 30 >> - >> >> codec.bloom.load: false >> - >> >> min_shard_index_buffer_size: 12m >> - >> >> compound_format : true >> - >> >> indices.fielddata.cache.size: 15% >> - >> >> indices.fielddata.cache.expire: 5m >> - >> >> indices.cache.filter.size: 15% >> - >> >> indices.cache.filter.expire: 5m >> >> Machine : 16GB RAM, Intel I&-2600 CPU @ 3.4GHZ. >> OS: 64bit WindowsServer 2012 R2 >> >> My raw data is CSV file and I use grok as a filter to parse it with >> output configuration (elasticsearch { embedded => true flush_size => >> 100000 idle_flush_time => 30 }). >> Row data size is about 100GB events per day which ES tries to input into >> one index (with 7 shards). >> >> At the beginning the insert was fast however after a while >> it's got extremely slow, 1.5K doc in 8K seconds :( >> >> Currently the index has around 140Million docs with size of 55GB. >> >> >> >> When I have analyzed the write to the disk with ProcMon I have seen that >> the process is writing in an interleaved manner to three kinds of files >> (.tim, .doc, and .pos) in 4K and 8K segments, instead of batching writes to >> some reasonable number. >> >> >> >> Appreciate the help. >> >> >> >> All the best, >> >> Yitzhak >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/aa0193d2-cb6a-49e4-b4a3-a2f821f732f8%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/aa0193d2-cb6a-49e4-b4a3-a2f821f732f8%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d6c4207d-8858-4440-81d6-61298fa5b0e4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
