as I posted before, our system does not fit very well in cluster structure, 
because we have many small indices in place (about 1k indices with an 
average of 6k records each), we guessed that with so many small indices, 
the cluster spent too much time and resources which nodes should be master 
, or where to locate absurdly small shards, etc... Bottom line is that the 
cluster always ended up not working right. BTW, I'm suspecting that with a 
few advanced tuning options of the cluster (shard routing and the like) we 
may be able to put it on again, but unfortunately we can't find that kind 
of knowledge in the standard doc. If any of you have any hint on this, it 
would be greatly appreciated!!!

Anyway, we need to scale the system somehow, and this is what we've come up 
with:

  - Our indices can have configuration variations that make a reindex 
needed at any time. it doesn't happen a lot, but it happens, and with 1k 
indices, it's bound to happen.
  - Indexing data is regenerated everyday, so every day the whole set of 
indices is re-created (we figured it's much faster to "recreate" the index 
than to update an existing one replacing everyone of its records)

We would like the machines used for searching results are only used for 
that, and never used for indexing/reindexing ops, because we don't want the 
user experience to suffer when searching against an already loaded server 
because it's doing some heavy indexing.

In our ideal scenario, indexing/reindexing would be done in devoted 
machines, which can be as many as needed, and searching would be done in 
different machines. We plan to use the snapshot/restore feature for that. 

Any time an index/reindex is needed, it would be done on one of these 
"indexing machines", and then the fresh index would be snapshotted, to be 
restored to the search machine afterwards. We should have some client 
control to make sure the "snapshot" process is only once at a time, it's my 
understanding that this is not the case in the restore process (i.e. you 
can have more than one restore process running on a cluster).

Individual item index can happen occasionally, but I figure when that 
happens we can just index to both the searching machines and the indexing 
machines, because it's never going to be big.

Please understand "cluster" instead of "machine"

How crazy does this whole thing sound, Is there any other way we can get 
some scalability?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/82d7dd51-1b86-4b0f-8abc-425a45f1dfac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to