Pushing up... On Thursday, April 10, 2014 8:13:33 AM UTC+3, Yitzhak Kesselman wrote: > > I am using logstash 1.4.0, if I had understand correctly it uses > automatically the Bulk API. Do I miss something ? > Is there a limit on the size of an Index (on single node machine)? > > > (BTW Itamar thanks for the help!) > > > On Wednesday, April 9, 2014 10:39:19 PM UTC+3, Itamar Syn-Hershko wrote: > >> Can you try batching writes to Elasticsearch? see >> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html >> >> -- >> >> Itamar Syn-Hershko >> http://code972.com | @synhershko <https://twitter.com/synhershko> >> Freelance Developer & Consultant >> Author of RavenDB in Action <http://manning.com/synhershko/> >> >> >> On Wed, Apr 9, 2014 at 10:33 PM, Yitzhak Kesselman <[email protected]>wrote: >> >>> Attached the index rate (using bigdesk): >>> <https://lh5.googleusercontent.com/-Jve-j75qB9o/U0WgK5ZMvMI/AAAAAAAAAFo/5_WZuCryeRw/s1600/bigdesk.png> >>> >>> The indexing requests per second is around 2K and the Indexing time per >>> second is around 3K >>> >>> >>> On Wednesday, April 9, 2014 9:36:12 PM UTC+3, Yitzhak Kesselman wrote: >>> >>>> Answers inline. >>>> >>>> Regarding the slow I/O. When I analyzed the creation of the Lucene >>>> index files I see that they are created without any special flags (such as >>>> no buffering or write through). This means that we’re paying costs twice – >>>> when we write the file we’re going cache data in Windows’ Cache Manager, >>>> which takes a lot of memory (which is then not available to the >>>> application >>>> itself) but when we read the file we don’t actually read it using the >>>> cache, which makes the operation slow. *Any ideas?* >>>> >>>> >>>> On Wednesday, April 9, 2014 5:28:11 PM UTC+3, Itamar Syn-Hershko wrote: >>>> >>>>> Shooting in the dark here, but here it goes: >>>>> >>>>> 1. Do you have anything else running on the system? for example AVs >>>>> are known to cause slow-downs for such services, and other I/O or memory >>>>> heavy services could cause thrashing or just general slowdown >>>>> >>>> No, nothing else is running on that machine. Initially it was working >>>> fast it got slower with that amount of data that in index. Moreover is >>>> there a way to increase buffer size for the Lucene index files (.tim, >>>> .doc, >>>> and .pos) from 8K to something much bigger. >>>> >>>>> >>>>> 2. What JVM version are you running this with? >>>>> >>>> java version "1.7.0_51" >>>> >>>> Java(TM) SE Runtime Environment (build 1.7.0_51-b13) >>>> >>>> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode) >>>> >>>> OS_NAME="Windows" >>>> >>>> OS_VERSION="5.2" >>>> >>>> OS_ARCH="amd64" >>>> >>>> 3. If you changed any of the default settings for merge factors etc - >>>>> can you revert that and try again? >>>>> >>>> Tried before was same behavior. >>>> >>>>> >>>>> 4. Can you try with embedded=false and see if it makes a difference? >>>>> >>>> Tried before was same behavior. >>>> >>>>> >>>>> -- >>>>> >>>>> Itamar Syn-Hershko >>>>> http://code972.com | @synhershko <https://twitter.com/synhershko> >>>>> Freelance Developer & Consultant >>>>> Author of RavenDB in Action <http://manning.com/synhershko/> >>>>> >>>>> >>>>> On Wed, Apr 9, 2014 at 4:11 PM, Yitzhak Kesselman >>>>> <[email protected]>wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> >>>>>> >>>>>> I have configured a single node ES with logstash 1.4.0 >>>>>> (8GB memory) with the following configuration: >>>>>> >>>>>> - >>>>>> >>>>>> index.number_of_shards: 7 >>>>>> - >>>>>> >>>>>> number_of_replicas: 0 >>>>>> - >>>>>> >>>>>> refresh_interval: -1 >>>>>> - >>>>>> >>>>>> translog.flush_threshold_ops: 100000 >>>>>> - >>>>>> >>>>>> merge.policy.merge_factor: 30 >>>>>> - >>>>>> >>>>>> codec.bloom.load: false >>>>>> - >>>>>> >>>>>> min_shard_index_buffer_size: 12m >>>>>> - >>>>>> >>>>>> compound_format : true >>>>>> - >>>>>> >>>>>> indices.fielddata.cache.size: 15% >>>>>> - >>>>>> >>>>>> indices.fielddata.cache.expire: 5m >>>>>> - >>>>>> >>>>>> indices.cache.filter.size: 15% >>>>>> - >>>>>> >>>>>> indices.cache.filter.expire: 5m >>>>>> >>>>>> Machine : 16GB RAM, Intel I&-2600 CPU @ 3.4GHZ. >>>>>> OS: 64bit WindowsServer 2012 R2 >>>>>> >>>>>> My raw data is CSV file and I use grok as a filter to parse it with >>>>>> output configuration (elasticsearch { embedded => true flush_size => >>>>>> 100000 idle_flush_time => 30 }). >>>>>> Row data size is about 100GB events per day which ES tries to input >>>>>> into one index (with 7 shards). >>>>>> >>>>>> At the beginning the insert was fast however after a while >>>>>> it's got extremely slow, 1.5K doc in 8K seconds :( >>>>>> >>>>>> Currently the index has around 140Million docs with size of 55GB. >>>>>> >>>>>> >>>>>> >>>>>> When I have analyzed the write to the disk with ProcMon I have seen >>>>>> that the process is writing in an interleaved manner to three kinds of >>>>>> files (.tim, .doc, and .pos) in 4K and 8K segments, instead of batching >>>>>> writes to some reasonable number. >>>>>> >>>>>> >>>>>> >>>>>> Appreciate the help. >>>>>> >>>>>> >>>>>> >>>>>> All the best, >>>>>> >>>>>> Yitzhak >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "elasticsearch" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>>> msgid/elasticsearch/aa0193d2-cb6a-49e4-b4a3-a2f821f732f8% >>>>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/aa0193d2-cb6a-49e4-b4a3-a2f821f732f8%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/6c6d0aa9-110e-4e6a-9aa2-de0d5b487c6a%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/6c6d0aa9-110e-4e6a-9aa2-de0d5b487c6a%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >>
-- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/065713b1-fb28-4a72-9a17-ac6b1e5426b0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
