Thank you for the link, it's very helpful.  The reason I chose 20 per daily 
index was because each day would hold around 750 million documents (each 
with just under 1000 fields).  This seemed like a fairly high data 
requirement that would require many nodes.  

If I only have one shard and one replica, then I'll have 365 x 2 = 720 
total shards per year.  If I run them on a 10 node cluster, then will the 
shards be allocated evenly (72 shards per node) even though it is really 2 
shards per index per node (and 365 indices)?  If I then need to grow the 
cluster to 20 servers, will the collection automatically re-balance in a 
reasonable time? That's a lot of data for the cluster to move!  My main 
goal is to be able to add hardware to the cluster if needed without 
re-indexing 750M x 365 = 273,750,000,000 documents (each with 1000 fields) 
since this could take a considerably long time to do.  Also, is it 
reasonable to expect high performance out of a single shard index with 750M 
records each with 1000 fields?

Finally, just as a data point, we're really indexing 750M records x 365 
days a year x 7 years which gives 1,916,250,000,000 documents for the ES 
cluster to chew on.  It'll definitely be a good test of the technology and 
interesting to see how the performance holds!  It's maybe even a good 
customer success story to put on the elasticsearch website if all goes 
well.  ;-)

On Wednesday, February 26, 2014 9:13:02 AM UTC-8, Jörg Prante wrote:
>
> I think you have a misconception about shard over-allocation and 
> re-indexing, so you should read
>
> https://groups.google.com/d/msg/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ
>
> where kimchy explains how over-allocation of shards work.
>
> If you have time-series indexes, you need not 20 shards per day, just in 
> fear to be able to stretch out to 20 nodes in the future. That is only true 
> for single, static, non-time-series indexes. With index aliasing and 
> routing applied to time-series data, 1 shard (+1 replica) per day might be 
> enough (maybe some more like 2 or 3, or more replica, it depends on 
> balancing out indexing and search load). For a year with a shard per day, 
> you will end up in 365 shards plus 365 replica shards which is quite a 
> handful, and in theory enough to distribute over 365 nodes. If shards start 
> to get tight on resources, use index aliasing and routing. Or just add 
> nodes, and ES will automatically redistribute the existing shards to become 
> happy again. No re-indexing at all.
>
> Jörg
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8edd9cfe-2856-4dcf-9ffb-7a5833b80fcb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to