I would certainly like to see that book, or at least a draft of it ;-) On Wed, Apr 22, 2015 at 10:12 AM, Kimbro Staken <ksta...@kstaken.com> wrote:
> Hello Fred, > > I have clusters as large as 200billion documents/130TB. Sharing > experiences on that would require a book, but a couple quick things that > jumped out at me. > > 1. do not go the huge server route. Elasticasearch works best when you > scale it horizontally. The 64GB route is a much better option. > > 2. If I understand correctly you're routing an entire months data to a > single shard? By doing that you're directing all activity on that shard to > a single machine, or small set of machines if you have replicas. That has > to be much slower than if you were to do something like use a monthly index > with a reasonable number of shards to spread that load across the cluster. > That is also creating shard sizes that are fairly large and if you have > month to month variation in data rates you'll end up with "lumpy" shard > sizes which will definitely cause issues if you ever run your cluster low > on disk space. > > 3. Get off of ES 1.3 as fast as you can. 8TB spread across 37 machines is > very low density, as you push more data in you don't want to be on ES 1.3. > > 4. If you're not already using doc_values start looking into it now. > Managing heap memory is let's be nice and call it "a challenge" and > fielddata can eat heap in ways that will make your head spin. > > > > Kimbro Staken > > > On Wed, Apr 22, 2015 at 1:14 AM, <fdevilla...@synthesio.com> wrote: > >> Hi list, >> >> I've been using ES in production since 0.17.6 with clusters up to 64 >> virtual machines and 20T data (including 3 replica). We're now thinking >> about pushing things a bit further and I wondered if people here had >> similar experience / needs as we do. >> >> Our current index is 1.1 billion unique documents, 8Tb data (including 1 >> replica) on 37 physical machines (32 data nodes, 3 master nodes and 2 nodes >> dedicated to http requests) with ES 1.3 (upgrade to 1.5 already planned). >> We're indexing about 2500 new documents / second and everything's fine so >> far. >> >> Our goal is to index (and search) about 30 billion more documents (the >> backdata) + about 200 million new documents each month. >> >> Our company is providing analytics dashboards to their clients, and they >> mostly browse their data on a monthly scale, so we're routing documents >> monthly. Each shard makes between 200 and 250G. The index is made of 128 >> shards, which makes about 10 years of data with 1 month per shard. >> Considering what we already have, we should reach 240T of data (and >> counting) with a single replica after we index all our backdata. >> >> So, my questions here: >> >> - Has someone here the same use / amount of data as we do? >> >> - Is ES the right technology to do realtime, ligthspeed queries (filtered >> queries and high cardinality agregations) on such an amount of data? >> >> - What were the traps to avoid? Is it better to add lots of medium >> machines (12 core Xeon E5-1650 v2, 64G RAM, 1.8T SAS 15k hard drives) or a >> few huge machines with petabytes of RAM, terabytes of SSD and multiple ES >> processes? >> >> Any feedback on similar situation is indeed appreciated. >> >> Have a nice day, >> Fred >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearch+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/6865703f-2302-4fe0-b929-eb9fbe55a84a%40googlegroups.com >> <https://groups.google.com/d/msgid/elasticsearch/6865703f-2302-4fe0-b929-eb9fbe55a84a%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAA0DmXZTqYgoKAKxLKGUeSXv_Mjjrer1dogaYARf1Ny7kio_3A%40mail.gmail.com > <https://groups.google.com/d/msgid/elasticsearch/CAA0DmXZTqYgoKAKxLKGUeSXv_Mjjrer1dogaYARf1Ny7kio_3A%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAH6s0fw07o07c6TFYnujh6wRka-gQK8wpYruWz6UJz1qiVUUQA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.