Hi Mark, The problem that we have is that each "customer" could generate 60-80 million docs/month on average. In addition, when a customer leaves, we would need to delete all their data. So hence it makes sense to have an index per customer (or even multiple indexes per customer). Another issue is that we are going to be needing to do a lot of "has_child" type of queries. And ES as it currently stands, loads up all the IDs of all the parent docs in index before running the query. So if we keep each customer on their own index, those has_child queries would only need to load up the ids for that specific client. In addition, one index with one shard per day is how Logstash works which is designed for ingesting a lot of data.
- Drew On Jun 26, 2014, at 6:24 PM, Mark Walkom <[email protected]> wrote: > Pretty sure he read it as I'd have offered the same advice :) > You cannot change the sharding of an index after creation, you need to > completely reindex the data to do so. This may not be a major issue for you > but it's something to take into account when you have hundreds or thousands > of customers, and hence indexes. > > You could also look at having a few indexes and use aliases and routing as > this would be a much more efficient way of doing things. > > Regards, > Mark Walkom > > Infrastructure Engineer > Campaign Monitor > email: [email protected] > web: www.campaignmonitor.com > > > On 27 June 2014 11:21, Drew Kutcharian <[email protected]> wrote: > Hi Andrew, > > Not sure if you read my original question. The question is about having a > separate index per customer since we are going to have < 1000 customers but > each would have a lot of data. Each shard comes with it's own overhead since > it's an instance of Lucene. I was going with the 1 shard with 1 replica route > because initially we can put a 100 of these customers on the same machine and > as they grow larger we can allocate more machines and move the indexes > around. With this approach, our capacity for a single customer would be the > max a single machine can handle which I think should be enough given our > requirements. If a customer is really pushing a single machine to it's max, > then we can move them to their own Elasticsearch cluster. > > - Drew > > > On Jun 26, 2014, at 1:57 PM, Andrew Selden <[email protected]> > wrote: > > > Drew, > > > > The Elasticsearch default is to create 5 shards for each index. I would > > start with this. Typically it is best to actually over-shard, which is to > > say have more than 1 shard per node per index. There is not really any > > measurable cost to this and it gives you flexibility in your design as you > > scale out. > > > > For example, if you start with 5 shards on a single server and then later > > decide you want to add another machine, Elasticsearch will automatically > > transfer some of those shards over to the new server, giving you better > > scalability. If you start with only 1 shard you will not get this benefit. > > > > Andrew > > > > On Jun 26, 2014, at 8:29 PM, Drew Kutcharian <[email protected]> wrote: > > > >> Hey Guys, > >> > >> I'm working on an analytics dashboard project where we collect events into > >> Elasticsearch for clients. Each client could have millions of events per > >> month. We are thinking of using one index with one shard and one replica > >> per client. Looking at Logstash, it seems like Logstash creates 1 index, > >> with 1 shard and 0 replicas per day, so that's where we got the > >> inspiration. We don't anticipate having more than 1000 "clients". Are > >> there any issues with this design pattern? > >> > >> Thanks, > >> > >> Drew > >> > >> -- > >> You received this message because you are subscribed to the Google Groups > >> "elasticsearch" group. > >> To unsubscribe from this group and stop receiving emails from it, send an > >> email to [email protected]. > >> To view this discussion on the web visit > >> https://groups.google.com/d/msgid/elasticsearch/9DC88022-E37D-4C55-81E6-71A52EC5B466%40venarc.com. > >> For more options, visit https://groups.google.com/d/optout. > > > > -- > > You received this message because you are subscribed to the Google Groups > > "elasticsearch" group. > > To unsubscribe from this group and stop receiving emails from it, send an > > email to [email protected]. > > To view this discussion on the web visit > > https://groups.google.com/d/msgid/elasticsearch/9915D1E3-BF3B-44DF-A060-45FA9FF05C46%40elasticsearch.com. > > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CA1CDC1E-3919-4D81-B4D3-9B4972FF5C87%40venarc.com. > For more options, visit https://groups.google.com/d/optout. > > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAEM624YOmvzABOgY_0bKyPYJRmF-UXKDUfK-CgTep6fLhhM65Q%40mail.gmail.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/EDA7CD58-7216-40D0-921C-AAE45ED0858B%40venarc.com. For more options, visit https://groups.google.com/d/optout.
