Hi, We were trying to reindex our index so that we could increase the number of shards in our index. The number of shards in our index is 20, and we wanted to make it 500.
So, we used Tire gem, reindex method for doing that (it basically make a scan search on the index, scroll through the index and then bulk insert for each scroll). But, we found that in our dev environment which has about 250 thousand documents: - When we do reindex with default size (which is 10, so 20 shards means 200 documents), we are getting data loss (only 50 to 60% were being indexed in the new index). - When we tried it with scan API and then inserting it one by one, it was inserting without data loss, but this obviously takes more time. - When we tried with size 1, (20 documents at a time), there was no data loss So, we went ahead and tried with size 1 (20 documents at a time), in the production environment (which has about 30 million records). We found that: - Even for size 1 ( 20 documents at a time), there was dataloss (we indexed around 220 thousand documents, and only 190 thousand documents were indexed. 30 thousand were lost) and it was also slow, so we had to stop in between. Why is this data loss happening during bulk insert? -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e991d917-be19-4fe5-8938-70df53cd3cde%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
