Hi, we are considering using ES as a primary data-source for a new project. our data is generated by millions of different users, *each having a relatively small number of documents, yet each having a different data schema.*
*we are considering several approaches:* * index per user - we are concerned with scaling the ES cluster to support millions of indexes, each having relatively small number of docs. * all users colocated on a single index - we are concerned that an ES index will not support millions of different fields (as each user has a different data schema). * mix of the two above - having X users colocated on a single index, and having Y such indexes to host our entire user population. * implementing some kind of a "mapping layer" that maps users' schema onto generic fields in one or more indexes. this would probably work, but of course is harder to implement & maintain. *so my questions:* * are there production deployments out there that have a million active indexes? what do they look like? * how many different fields does it make sense to host in a single index? would it scale to millions of fields in a single index? * are there other ways to go about this that we have overlooked? thanks!! -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f98651e6-5e6a-4ed8-aba8-b5e91078f036%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
