I am using logstash 1.4.0, if I had understand correctly it uses automatically the Bulk API. Do I miss something ? Is there a limit on the size of an Index (on single node machine)?
(BTW Itamar thanks for the help!) On Wednesday, April 9, 2014 10:39:19 PM UTC+3, Itamar Syn-Hershko wrote: > Can you try batching writes to Elasticsearch? see > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html > > -- > > Itamar Syn-Hershko > http://code972.com | @synhershko <https://twitter.com/synhershko> > Freelance Developer & Consultant > Author of RavenDB in Action <http://manning.com/synhershko/> > > > On Wed, Apr 9, 2014 at 10:33 PM, Yitzhak Kesselman > <[email protected]<javascript:> > > wrote: > >> Attached the index rate (using bigdesk): >> <https://lh5.googleusercontent.com/-Jve-j75qB9o/U0WgK5ZMvMI/AAAAAAAAAFo/5_WZuCryeRw/s1600/bigdesk.png> >> >> The indexing requests per second is around 2K and the Indexing time per >> second is around 3K >> >> >> On Wednesday, April 9, 2014 9:36:12 PM UTC+3, Yitzhak Kesselman wrote: >> >>> Answers inline. >>> >>> Regarding the slow I/O. When I analyzed the creation of the Lucene >>> index files I see that they are created without any special flags (such as >>> no buffering or write through). This means that we’re paying costs twice – >>> when we write the file we’re going cache data in Windows’ Cache Manager, >>> which takes a lot of memory (which is then not available to the application >>> itself) but when we read the file we don’t actually read it using the >>> cache, which makes the operation slow. *Any ideas?* >>> >>> >>> On Wednesday, April 9, 2014 5:28:11 PM UTC+3, Itamar Syn-Hershko wrote: >>> >>>> Shooting in the dark here, but here it goes: >>>> >>>> 1. Do you have anything else running on the system? for example AVs are >>>> known to cause slow-downs for such services, and other I/O or memory heavy >>>> services could cause thrashing or just general slowdown >>>> >>> No, nothing else is running on that machine. Initially it was working >>> fast it got slower with that amount of data that in index. Moreover is >>> there a way to increase buffer size for the Lucene index files (.tim, .doc, >>> and .pos) from 8K to something much bigger. >>> >>>> >>>> 2. What JVM version are you running this with? >>>> >>> java version "1.7.0_51" >>> >>> Java(TM) SE Runtime Environment (build 1.7.0_51-b13) >>> >>> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode) >>> >>> OS_NAME="Windows" >>> >>> OS_VERSION="5.2" >>> >>> OS_ARCH="amd64" >>> >>> 3. If you changed any of the default settings for merge factors etc - >>>> can you revert that and try again? >>>> >>> Tried before was same behavior. >>> >>>> >>>> 4. Can you try with embedded=false and see if it makes a difference? >>>> >>> Tried before was same behavior. >>> >>>> >>>> -- >>>> >>>> Itamar Syn-Hershko >>>> http://code972.com | @synhershko <https://twitter.com/synhershko> >>>> Freelance Developer & Consultant >>>> Author of RavenDB in Action <http://manning.com/synhershko/> >>>> >>>> >>>> On Wed, Apr 9, 2014 at 4:11 PM, Yitzhak Kesselman >>>> <[email protected]>wrote: >>>> >>>>> Hi, >>>>> >>>>> >>>>> >>>>> I have configured a single node ES with logstash 1.4.0 >>>>> (8GB memory) with the following configuration: >>>>> >>>>> - >>>>> >>>>> index.number_of_shards: 7 >>>>> - >>>>> >>>>> number_of_replicas: 0 >>>>> - >>>>> >>>>> refresh_interval: -1 >>>>> - >>>>> >>>>> translog.flush_threshold_ops: 100000 >>>>> - >>>>> >>>>> merge.policy.merge_factor: 30 >>>>> - >>>>> >>>>> codec.bloom.load: false >>>>> - >>>>> >>>>> min_shard_index_buffer_size: 12m >>>>> - >>>>> >>>>> compound_format : true >>>>> - >>>>> >>>>> indices.fielddata.cache.size: 15% >>>>> - >>>>> >>>>> indices.fielddata.cache.expire: 5m >>>>> - >>>>> >>>>> indices.cache.filter.size: 15% >>>>> - >>>>> >>>>> indices.cache.filter.expire: 5m >>>>> >>>>> Machine : 16GB RAM, Intel I&-2600 CPU @ 3.4GHZ. >>>>> OS: 64bit WindowsServer 2012 R2 >>>>> >>>>> My raw data is CSV file and I use grok as a filter to parse it with >>>>> output configuration (elasticsearch { embedded => true flush_size => >>>>> 100000 idle_flush_time => 30 }). >>>>> Row data size is about 100GB events per day which ES tries to input >>>>> into one index (with 7 shards). >>>>> >>>>> At the beginning the insert was fast however after a while >>>>> it's got extremely slow, 1.5K doc in 8K seconds :( >>>>> >>>>> Currently the index has around 140Million docs with size of 55GB. >>>>> >>>>> >>>>> >>>>> When I have analyzed the write to the disk with ProcMon I have seen >>>>> that the process is writing in an interleaved manner to three kinds of >>>>> files (.tim, .doc, and .pos) in 4K and 8K segments, instead of batching >>>>> writes to some reasonable number. >>>>> >>>>> >>>>> >>>>> Appreciate the help. >>>>> >>>>> >>>>> >>>>> All the best, >>>>> >>>>> Yitzhak >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "elasticsearch" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>> msgid/elasticsearch/aa0193d2-cb6a-49e4-b4a3-a2f821f732f8% >>>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/aa0193d2-cb6a-49e4-b4a3-a2f821f732f8%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/6c6d0aa9-110e-4e6a-9aa2-de0d5b487c6a%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/6c6d0aa9-110e-4e6a-9aa2-de0d5b487c6a%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1a97a59e-aeec-4b3a-bbb7-5e3df09ed3ec%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
