A simple cluster, 2 nodes, replica 1. Each node has 1.5Go RAM, 2 cores, SAS disks. With Elasticsearch 1.1 some deconnection appears, and some CPU load picks. Bad usage of logstash (lots of tiny bulk imports, with monthly indices). Logstash usage was fixed, and Elasticsearch upgraded to 1.3. 1.3.3, then 1.3.4, 10 minutes after. The CPU usage is now 100% (so one core used), LOTS of file descriptor opened, and memory usage is growing. RAM is upgraded to 2Go.
Strace show that 5 threads use lots of CPU and 1 thread does 7000 stat()/s. Elasticsearch Hot thread show lots of FSDirectory.listAll. Disk usage is low, just a lots of stats. The shard is set to 9, and logstash opens lots of indices, 2286 shards for 7GB, 37487 files in the indices folder. In the recovery API, everything is "done" with strange percent score, all shards have "replica" states. Now, the load makes heavy waves, slowing the service. This is just a long migration from different version of Lucene (from ES 1.1 to 1.3), a misconfiguration, a real bug, or am I just doomed? -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2e7ad3ba-a1f6-44c8-b9e5-67b0c1ed8bc9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
