Can you try batching writes to Elasticsearch? see http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html
-- Itamar Syn-Hershko http://code972.com | @synhershko <https://twitter.com/synhershko> Freelance Developer & Consultant Author of RavenDB in Action <http://manning.com/synhershko/> On Wed, Apr 9, 2014 at 10:33 PM, Yitzhak Kesselman <[email protected]>wrote: > Attached the index rate (using bigdesk): > > <https://lh5.googleusercontent.com/-Jve-j75qB9o/U0WgK5ZMvMI/AAAAAAAAAFo/5_WZuCryeRw/s1600/bigdesk.png> > > The indexing requests per second is around 2K and the Indexing time per > second is around 3K > > > On Wednesday, April 9, 2014 9:36:12 PM UTC+3, Yitzhak Kesselman wrote: > >> Answers inline. >> >> Regarding the slow I/O. When I analyzed the creation of the Lucene index >> files I see that they are created without any special flags (such as no >> buffering or write through). This means that we're paying costs twice - >> when we write the file we're going cache data in Windows' Cache Manager, >> which takes a lot of memory (which is then not available to the application >> itself) but when we read the file we don't actually read it using the >> cache, which makes the operation slow. *Any ideas?* >> >> >> On Wednesday, April 9, 2014 5:28:11 PM UTC+3, Itamar Syn-Hershko wrote: >> >>> Shooting in the dark here, but here it goes: >>> >>> 1. Do you have anything else running on the system? for example AVs are >>> known to cause slow-downs for such services, and other I/O or memory heavy >>> services could cause thrashing or just general slowdown >>> >> No, nothing else is running on that machine. Initially it was working >> fast it got slower with that amount of data that in index. Moreover is >> there a way to increase buffer size for the Lucene index files (.tim, .doc, >> and .pos) from 8K to something much bigger. >> >>> >>> 2. What JVM version are you running this with? >>> >> java version "1.7.0_51" >> >> Java(TM) SE Runtime Environment (build 1.7.0_51-b13) >> >> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode) >> >> OS_NAME="Windows" >> >> OS_VERSION="5.2" >> >> OS_ARCH="amd64" >> >> 3. If you changed any of the default settings for merge factors etc - can >>> you revert that and try again? >>> >> Tried before was same behavior. >> >>> >>> 4. Can you try with embedded=false and see if it makes a difference? >>> >> Tried before was same behavior. >> >>> >>> -- >>> >>> Itamar Syn-Hershko >>> http://code972.com | @synhershko <https://twitter.com/synhershko> >>> Freelance Developer & Consultant >>> Author of RavenDB in Action <http://manning.com/synhershko/> >>> >>> >>> On Wed, Apr 9, 2014 at 4:11 PM, Yitzhak Kesselman <[email protected]>wrote: >>> >>>> Hi, >>>> >>>> >>>> >>>> I have configured a single node ES with logstash 1.4.0 >>>> (8GB memory) with the following configuration: >>>> >>>> - >>>> >>>> index.number_of_shards: 7 >>>> - >>>> >>>> number_of_replicas: 0 >>>> - >>>> >>>> refresh_interval: -1 >>>> - >>>> >>>> translog.flush_threshold_ops: 100000 >>>> - >>>> >>>> merge.policy.merge_factor: 30 >>>> - >>>> >>>> codec.bloom.load: false >>>> - >>>> >>>> min_shard_index_buffer_size: 12m >>>> - >>>> >>>> compound_format : true >>>> - >>>> >>>> indices.fielddata.cache.size: 15% >>>> - >>>> >>>> indices.fielddata.cache.expire: 5m >>>> - >>>> >>>> indices.cache.filter.size: 15% >>>> - >>>> >>>> indices.cache.filter.expire: 5m >>>> >>>> Machine : 16GB RAM, Intel I&-2600 CPU @ 3.4GHZ. >>>> OS: 64bit WindowsServer 2012 R2 >>>> >>>> My raw data is CSV file and I use grok as a filter to parse it with >>>> output configuration (elasticsearch { embedded => true flush_size => >>>> 100000 idle_flush_time => 30 }). >>>> Row data size is about 100GB events per day which ES tries to input >>>> into one index (with 7 shards). >>>> >>>> At the beginning the insert was fast however after a while >>>> it's got extremely slow, 1.5K doc in 8K seconds :( >>>> >>>> Currently the index has around 140Million docs with size of 55GB. >>>> >>>> >>>> >>>> When I have analyzed the write to the disk with ProcMon I have seen >>>> that the process is writing in an interleaved manner to three kinds of >>>> files (.tim, .doc, and .pos) in 4K and 8K segments, instead of batching >>>> writes to some reasonable number. >>>> >>>> >>>> >>>> Appreciate the help. >>>> >>>> >>>> >>>> All the best, >>>> >>>> Yitzhak >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/elasticsearch/aa0193d2-cb6a-49e4-b4a3-a2f821f732f8% >>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/aa0193d2-cb6a-49e4-b4a3-a2f821f732f8%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/6c6d0aa9-110e-4e6a-9aa2-de0d5b487c6a%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/6c6d0aa9-110e-4e6a-9aa2-de0d5b487c6a%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZvQcxHRMbZdf%3Di1-EmoP1Dq3BxKW%2Bau0r0Ox8fbJSRo%2Bw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
