On Thu, Jun 12, 2014 at 9:31 AM, Ralph Meijer <[email protected]> wrote: > On 2014-06-12 09:23, Adrien Grand wrote: > > How many of these 100M documents have been indexed with Elasticsearch > > 1.2.0? > > All of them, and I have like 40 of them, with their size a similar order > of magnitude :-/ >
If all of them were indexed in 1.2.0 then it makes sense to just reindex from _source as you suggested. > > The tool would only send indexing requests for documents that have > > been misrouted, so if you only see 8 indexing requests per second, which > is > > indeed low, that might mean that the bottleneck is searching for the > > mis-routed documents. > > I couldn't find the source of the tool on GitHub, but maybe if I had > some insight into how that search works, I could change some settings. > Ideas welcome, of course. If it helps, I'm in #elasticsearch as ralphm. > It does a SCAN search that filters using a script filter (the one you added to your config/scripts). And for each matching documents, it reindexes it to the right shard, either by requiring a create operation (copy_if_missing) or overriding a potential document that would have the same _id (copy_overwrite). -- Adrien Grand -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j585BOmxdcjmiQV9vsPPfhiOd7z2wxL3Je5gcTRX5NVfw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
