Thanks Jörg, we've heard of others pre-creating indices, we were seeing it as a work around rather than a regular action but what you say makes it seem like something we should work with.
On Tuesday, May 13, 2014 12:13:10 PM UTC+1, Jörg Prante wrote: > > You should create indexes before bulk indexing. First, bulk indexing works > much better if all indices and their mappings are already present, the > operations will run faster and without conflicts, and the cluster state > updates are less frequent which reduces some noise and hiccups. Second, > setting the indices refresh rate to -1 and replica level to 0 while in bulk > indexing mode helps a lot for performance. > > If you create 1000+ shards per node, you seem to exceed the limit of your > system. Do not expect admin operations like index creation work in O(1) > time, they are O(n/c) with n = number of affected shards and c the > threadpool size for the operation (the total node number also counts but I > neglect it here). So yes, it is expected that index creation operations > take longer if they reach the limit of your nodes, but there can be plenty > of reasons for it (increasing shard count is just one of them). And it is > expected that you see the 30s cluster action timeout in theses cases, yes. > > There is no strictly predictable resource limit for a node, all this > depends heavily on factors from outside of Elasticsearch (JVM, CPU, memory, > disk I/O, your workload of indexing/searching) so it is up to you to > calibrate your node capacity. After adding nodes, you will observe that ES > scales well and can handle more shards. > > Jörg > > > On Tue, May 13, 2014 at 11:59 AM, Paul <[email protected] <javascript:>>wrote: > >> We are seeing a slow down in shard initialization speed as the number of >> shards/indices grows in our cluster. >> >> With 0-100's of indices/shards existing in the cluster a new bulk >> creation of indices up the 100's at a time is fine, we see them pass >> through the states and get a green cluster in a reasonable amount of time. >> >> As the total cluster size grows to 1000+ indices (3000+ shards) we begin >> to notice that the first rounds of initialization take longer to process, >> it seems to speed up after the first few batches, but this slow down leads >> to "failed to process cluster event (create-index [index_1112], cause >> [auto(bulk api)]) within 30s" type messages in the Master logs - the >> indices are eventually created. >> >> >> Has anyone else experienced this? (did you find the cause / way to fix?) >> >> Is this somewhat expected behaviour? - are we approaching something >> incorrectly? (there are 3 data nodes involved, with 3 shards per index) >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6c918772-cd05-4640-aa67-3924737b3342%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
