Hi,

We were trying to reindex our index so that we could increase the number of 
shards in our index. The number of shards in our index is 20, and we wanted 
to make it 500.

So, we used Tire gem, reindex method for doing that (it basically make a 
scan search on the index, scroll through the index and then bulk insert for 
each scroll). But, we found that in our dev environment which has about 250 
thousand documents:

   - When we do reindex with default size (which is 10, so 20 shards means 
   200 documents), we are getting data loss (only 50 to 60% were being indexed 
   in the new index). 
   - When we tried it with scan API and then inserting it one by one, it 
   was inserting without data loss, but this obviously takes more time. 
   - When we tried with size 1, (20 documents at a time), there was no data 
   loss
   
So, we went ahead and tried with size 1 (20 documents at a time), in the 
production environment (which has about 30 million records). We found that:

   - Even for size 1 ( 20 documents at a time), there was dataloss (we 
   indexed around 220 thousand documents, and only 190 thousand documents were 
   indexed. 30 thousand were lost) and it was also slow, so we had to stop in 
   between.

Why is this data loss happening during bulk insert?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e991d917-be19-4fe5-8938-70df53cd3cde%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to