Hi Eran, If you are assigning your own ID, Elasticsearch need to search and check if the document already exists before writing it. This could explain why the bulk insert performance goes down as the size of the index grows. If you are not going to update the documents, I would therefore recommend allowing Elasticsearch to assign the document ID automatically.
Best regards, Christian On Friday, April 24, 2015 at 7:49:56 AM UTC+1, Eran wrote: > > Hello, > > I've created an index I use for logging. > > This means there are mostly writes, and some searches once in a while. > In the phase of the first loading, I'm using several clients to > concurrently index documents using the bulk API. > > At first, indexing takes 200 ms for a bulk of 5000 documents. > As time goes by, the indexing time increases, and gets to 1000-4500 ms. > > I am using an EC2 c3.8xl machine with 32 cores, and 60 GB of memory, with > an IO provisioned volume set to 7000 IOPS. > > Looking at the metrics, I see that the CPU and memory are fine, the write > IOPS are at 300, but the read IOPS have slowly gone up and got to 7000. > > How come I'm only indexing, but most of the IOPS are read? > > I am attaching some screen captures from the BigDesk plugin, that show the > two states of the index, ater about 20% of the graphs is the point in time > where I stopped the clients, so you can see the load drop of. > > My settings are: > > threadpool.bulk.type: fixed > threadpool.bulk.size: 32 # availableProcessors > threadpool.bulk.queue_size: 1000 > > # Indices settings > indices.memory.index_buffer_size: 50% > > > 376,1 97% > indices.cache.filter.expire: 6h > > bootstrap.mlockall: true > > > and I've change the index settings to: > > > {"index":{"refresh_interval":"60m","translog":{"flush_threshold_size":"1gb","flush_threshold_ops":"50000"}}} > I also tried "refresh_interval":"-1" > > > Please let me know what else I need to provide if needed (settings, logs, > metrics) > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f3ad37d7-a070-4065-aa85-6f38d4329502%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.