as I posted before, our system does not fit very well in cluster structure, because we have many small indices in place (about 1k indices with an average of 6k records each), we guessed that with so many small indices, the cluster spent too much time and resources which nodes should be master , or where to locate absurdly small shards, etc... Bottom line is that the cluster always ended up not working right. BTW, I'm suspecting that with a few advanced tuning options of the cluster (shard routing and the like) we may be able to put it on again, but unfortunately we can't find that kind of knowledge in the standard doc. If any of you have any hint on this, it would be greatly appreciated!!!
Anyway, we need to scale the system somehow, and this is what we've come up with: - Our indices can have configuration variations that make a reindex needed at any time. it doesn't happen a lot, but it happens, and with 1k indices, it's bound to happen. - Indexing data is regenerated everyday, so every day the whole set of indices is re-created (we figured it's much faster to "recreate" the index than to update an existing one replacing everyone of its records) We would like the machines used for searching results are only used for that, and never used for indexing/reindexing ops, because we don't want the user experience to suffer when searching against an already loaded server because it's doing some heavy indexing. In our ideal scenario, indexing/reindexing would be done in devoted machines, which can be as many as needed, and searching would be done in different machines. We plan to use the snapshot/restore feature for that. Any time an index/reindex is needed, it would be done on one of these "indexing machines", and then the fresh index would be snapshotted, to be restored to the search machine afterwards. We should have some client control to make sure the "snapshot" process is only once at a time, it's my understanding that this is not the case in the restore process (i.e. you can have more than one restore process running on a cluster). Individual item index can happen occasionally, but I figure when that happens we can just index to both the searching machines and the indexing machines, because it's never going to be big. Please understand "cluster" instead of "machine" How crazy does this whole thing sound, Is there any other way we can get some scalability? -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/82d7dd51-1b86-4b0f-8abc-425a45f1dfac%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
