Hi all, I've seen lots of posts about this, and want to make sure I'm understanding correctly.
Background: - Our cluster has 6 servers. They are Dell R720xd with 64GB RAM, 2xE5-2600v2 CPU (2 sockets, 6 cores/socket), 16TB disk - Elasticsearch is set to have 6 shards, and 1 replica, giving two shards per server. I'm giving ES 32GB heaps on Java 1.7 with G1 GC. I'm concerned about the size of our indexes. Right now, we store all data in one index per day, with various types within that to separate data. The indexes are averaging about 50GB/day (not including replicas). Shard size is 8GB each. We have a LOT more data to index. At least 20x more. Should I be concerned with indexes of that size (~1000GB) and shards of that size (~160GB)? Is it merely a question of having enough hardware, or is there more to it? I'm considering splitting the data into a different indexing strategy so that the index size is smaller, but there are more of them. The result is the amount of data is the same, so I'm not sure if that will do anything or not. If I'm optimizing for searching, does querying multiple smaller indices perform better than querying fewer larger ones? Thank you for your time. Chris -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAND3Dpgr78LJ%3DcWb0ZbyHZqMin4tDSVPvjG%3D_PYgsQym9EzZ%3Dg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
