It is an issue, as I am hitting 7000 read operations per second (the limit of my volume's iops)
As the index grow larger the problem worsens, and as I was once able to update with a 10 clients concurrently, now I can barely use one client. Also, I used an _optimize endpoint to have all segments synced, and even then, the read operations spike immediately on the first indexing operation (I'm using BigDesk to follow this). So I do not think it is a merge effect, as my intuition would be a merge happens every once in a while? Maybe this is actually a result of me not using "doc values"? could that be it? On Friday, April 24, 2015 at 12:28:50 PM UTC+3, David Pilato wrote: > That’s normal. I was just answering that even if you think you are only > writing data while indexing, you are also reading data behind the scene to > merge Lucene segments. > You can potentially try to play with index.translog.flush_threshold_size > > > http://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-translog.html > > And increase the transaction log size? > > It might help reducing the number of segments generated but that said you > will always have READs operations. > > Actually, is it an issue for you? If not, keeping all defaults values > might be good. > > Best > > > -- > *David Pilato* - Developer | Evangelist > *elastic.co <http://elastic.co>* > @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr > <https://twitter.com/elasticsearchfr> | @scrutmydocs > <https://twitter.com/scrutmydocs> > > > > > > Le 24 avr. 2015 à 10:45, Eran <era...@gmail.com <javascript:>> a écrit : > > Hey David, > > I suspect it indeed might be the cause, but I'm kind of a newbie here. > What metric do I need to monitor, what would be a problematic value, and > basically, how can I play with merge settings to test if I can improve this? > Some rules of thumbs for a newbie would be appreciated. > > I installed the plugin SegmentSpy, and here is a screenshot, if that helps. > > Eran > > On Friday, April 24, 2015 at 11:02:27 AM UTC+3, David Pilato wrote: >> >> Merging segments could be the cause here? >> >> David >> >> Le 24 avr. 2015 à 09:54, Eran <era...@gmail.com> a écrit : >> >> Forgot some stats: >> >> I have 10 shards, no replicas, all on the same machine. >> ATM, there are some 1.5 billion records in the index. >> >> >> On Friday, April 24, 2015 at 10:18:27 AM UTC+3, Eran wrote: >>> >>> attachments hereby >>> >>> On Friday, April 24, 2015 at 9:49:56 AM UTC+3, Eran wrote: >>>> >>>> Hello, >>>> >>>> I've created an index I use for logging. >>>> >>>> This means there are mostly writes, and some searches once in a while. >>>> In the phase of the first loading, I'm using several clients to >>>> concurrently index documents using the bulk API. >>>> >>>> At first, indexing takes 200 ms for a bulk of 5000 documents. >>>> As time goes by, the indexing time increases, and gets to 1000-4500 ms. >>>> >>>> I am using an EC2 c3.8xl machine with 32 cores, and 60 GB of memory, >>>> with an IO provisioned volume set to 7000 IOPS. >>>> >>>> Looking at the metrics, I see that the CPU and memory are fine, the >>>> write IOPS are at 300, but the read IOPS have slowly gone up and got to >>>> 7000. >>>> >>>> How come I'm only indexing, but most of the IOPS are read? >>>> >>>> I am attaching some screen captures from the BigDesk plugin, that show >>>> the two states of the index, ater about 20% of the graphs is the point in >>>> time where I stopped the clients, so you can see the load drop of. >>>> >>>> My settings are: >>>> >>>> threadpool.bulk.type: fixed >>>> threadpool.bulk.size: 32 # availableProcessors >>>> threadpool.bulk.queue_size: 1000 >>>> >>>> # Indices settings >>>> indices.memory.index_buffer_size: 50% >>>> >>>> >>>> >>>> 376,1 97% >>>> indices.cache.filter.expire: 6h >>>> >>>> bootstrap.mlockall: true >>>> >>>> >>>> and I've change the index settings to: >>>> >>>> >>>> {"index":{"refresh_interval":"60m","translog":{"flush_threshold_size":"1gb","flush_threshold_ops":"50000"}}} >>>> I also tried "refresh_interval":"-1" >>>> >>>> >>>> Please let me know what else I need to provide if needed (settings, >>>> logs, metrics) >>>> >>>> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearc...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/a64e78f3-5d69-4ca1-a3c9-86735a25343d%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/a64e78f3-5d69-4ca1-a3c9-86735a25343d%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> >> > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearc...@googlegroups.com <javascript:>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/dd232398-080a-488c-a952-b98c2a6da903%40googlegroups.com > > <https://groups.google.com/d/msgid/elasticsearch/dd232398-080a-488c-a952-b98c2a6da903%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > <Screen Shot 2015-04-24 at 11.42.16.png> > > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/854e4e94-32ce-4fcd-9dba-7a0e57923b82%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.