The number of documents is not relevant to the search time. Important factors for search time are the type of query, shard size, the number of unique terms (the dictionary size), the number of segments, network latency, disk drive latency, ...
Maybe you mean equal distribution of docs with same average size across shards. This means a search does not have to wait for nodes that must search in larger shards. I do not think this needs a river plugin, since equal distribution of docs over the shards is the default. Jörg On Tue, Apr 8, 2014 at 9:03 PM, Josh Harrison <[email protected]> wrote: > I have heard that ideally, you want to have a similar number of documents > per shard for optimal search times, is that correct? > > I have data volumes that are just all over the place, from 100k to tens of > millions in a week. > > I'm thinking about a river plugin that could: > Take a mapping object as a template > Define a template for child index names (project_\YYYY_\MM_\DD_\NNN = > project_2014_04_08_000, etc) > Define index shard count (5) > Define maximum index size (1,000,000) > Define a listening endpoint of some sort > > Documents would stream into the listening endpoint however you wanted, > rivers, bulk loads using an API, etc. They would be automatically routed to > the lowest numbered not-full index. So on a given day you could end up with > fifteen indexes, or eighty, or two, but they'd all be a maximum of N > records. > > A plugin seems desirable in this case, as it frees you from needing to > write the load balancing into every ingestion stream you've got. > > Is this a reasonable solution to this problem? Am I overcomplicating > things? > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/176f4fb2-d924-4ec2-bcee-67ad8de24dfb%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/176f4fb2-d924-4ec2-bcee-67ad8de24dfb%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEq-fkJiHMkXa6myjBSjB0ut0PYZN1R2_-HTfXvF4E-Jw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
