Thank you for the link, it's very helpful. The reason I chose 20 per daily index was because each day would hold around 750 million documents (each with just under 1000 fields). This seemed like a fairly high data requirement that would require many nodes.
If I only have one shard and one replica, then I'll have 365 x 2 = 720 total shards per year. If I run them on a 10 node cluster, then will the shards be allocated evenly (72 shards per node) even though it is really 2 shards per index per node (and 365 indices)? If I then need to grow the cluster to 20 servers, will the collection automatically re-balance in a reasonable time? That's a lot of data for the cluster to move! My main goal is to be able to add hardware to the cluster if needed without re-indexing 750M x 365 = 273,750,000,000 documents (each with 1000 fields) since this could take a considerably long time to do. Also, is it reasonable to expect high performance out of a single shard index with 750M records each with 1000 fields? Finally, just as a data point, we're really indexing 750M records x 365 days a year x 7 years which gives 1,916,250,000,000 documents for the ES cluster to chew on. It'll definitely be a good test of the technology and interesting to see how the performance holds! It's maybe even a good customer success story to put on the elasticsearch website if all goes well. ;-) On Wednesday, February 26, 2014 9:13:02 AM UTC-8, Jörg Prante wrote: > > I think you have a misconception about shard over-allocation and > re-indexing, so you should read > > https://groups.google.com/d/msg/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ > > where kimchy explains how over-allocation of shards work. > > If you have time-series indexes, you need not 20 shards per day, just in > fear to be able to stretch out to 20 nodes in the future. That is only true > for single, static, non-time-series indexes. With index aliasing and > routing applied to time-series data, 1 shard (+1 replica) per day might be > enough (maybe some more like 2 or 3, or more replica, it depends on > balancing out indexing and search load). For a year with a shard per day, > you will end up in 365 shards plus 365 replica shards which is quite a > handful, and in theory enough to distribute over 365 nodes. If shards start > to get tight on resources, use index aliasing and routing. Or just add > nodes, and ES will automatically redistribute the existing shards to become > happy again. No re-indexing at all. > > Jörg > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8edd9cfe-2856-4dcf-9ffb-7a5833b80fcb%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
