Hey all! Background: I am using elasticsearch with logstash to do some log analysis. My use-case is write-heavy, and I have configured ES accordingly. After experimenting with different setups, I am considering the following implementation:
*separate log processing from ES cluster* 1x Logstash server 2x ES server (1x master, 1x data-only): - 17GB memory - Running single ES node with 9GB allocated memory This should be plenty of memory for the relatively small dataset I am starting with, and will expand as needed. However, I have the following questions/concerns: It is my understanding that, ideally, we want one shard per index per node (plus an additional replica shard per primary shard per node assuming number of replicas is set to 1), meaning in this setup, I would set number of shards per index to 2. Each index is, as of now, relatively small (~500MB), so two shards should be fine. However, as we scale the project, the indexes will grow, and we will eventually want to split them into more shards. On the hardware side, the ES servers are relatively lightweight. As we scale, we have the option to simply beef up the hardware. Finally, my understanding is that increasing the number of shards/index down the line requires a full reindexing of the data, which I would like to avoid. It seems to me that I would be better off setting shards/index to 4, in anticipation of future scaling. Are there costs to this that I am missing? What about starting off with a single ES node on a beefier server? Should I be concerned about availability with a single-node cluster (no replicas)? Thanks for reading :) -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6c988e28-58dc-4af7-86a9-16d763ce4ff7%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
