That is correct, I was mixing the terms "nodes" and "shards" (sorry about 
that).  I'm running the test on a single node (machine).  I've chosen 20 
shards so we could eventually go to a 20 server cluster without 
re-indexing.  It's unlikely we'll ever need to go that high but we never 
know and given we receive 750 million messages a day, the thought of 
reindexing after collecting a years worth of data makes me nervous.  If I 
can "over shard" and avoid a massive reindex then I'll be a happy guy.

I thought about reducing the 20 shards but even if I go to say 5 shards on 
5 machines (1 shard per machine?) then I'll still run into the issue if a 
user searches several years back.  Any other thoughts on a possible 
solution?  Would increasing the queue size be a good option.  Is there a 
down side (performance hit, running out of resources, etc)?

Thanks again!

On Tuesday, February 25, 2014 11:32:26 PM UTC-8, David Pilato wrote:
>
> You are mixing nodes and shards, right?
> How many elasticsearch nodes do you have to manage your 7300 shards?
> Why did you set 20 shards per index?
>
> You can increase the queue size in elasticsearch.yml but I'm not sure it's 
> the right thing to do here.
>
> My 2 cents
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
>
> Le 26 févr. 2014 à 01:36, Alex Clark <al...@bitstew.com <javascript:>> a 
> écrit :
>
> Hello all, I’m getting failed nodes when running searches and I’m hoping 
> someone can point me in the right direction.  I have indices created per 
> day to store messages.  The pattern is pretty straight forward: the index 
> for January 1 is "messages_20140101", for January 2 is "messages_20140102" 
> and so on.  Each index is created against a template that specifies 20 
> shards. A full year will give 365 indices * 20 shards = 7300 nodes. I have 
> recently upgraded to ES 1.0.
>
> When I search for all messages in a year (either using an alias or 
> specifying “messages_2013*”), I get many failed nodes.  The reason given 
> is: “EsRejectedExecutionException[rejected execution (queue capacity 
> 1000) on 
> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4@651b8924<javascript:>]”).
>   
> The more often I search, the fewer failed nodes I get (probably caching in 
> ES) but I can’t get down to 0 failed nodes.  I’m using ES for analytics so 
> the document counts coming back have to be accurate. The aggregate counts 
> will change depending on the number of node failures.  We use the Java API 
> to create a local node to index and search the documents.  However, we also 
> see the issue if we use the URL search API on port 9200.
>
> If I restrict the search for 30 days then I do not see any failures (it’s 
> under 1000 nodes so as expected).  However, it is a pretty common use case 
> for our customers to search messages spanning an entire year.  Any 
> suggestions on how I can prevent these failures?  
>
> Thank you for your help!
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com <javascript:>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/9bf6d3bb-34e5-44c4-8d76-24f868d283a0%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/954f7266-6587-4509-8159-aae5897dc2b6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to