Missing filter is fairly costly. I do not believe you need it as > 0 should 
take care of excluding nulls

one thread can act on one shard at the same time so the only way you can 
parallelize you query is by splitting it onto more shards to let multiple 
threads do parallel work on smaller sized shards. So if your server has say 
16 cores you may consider roughly the same number of shards (maybe a bit 
fewer)  
If it is IO bound rather than CPU bound, more memory for OS level caching 
and probably bumping up ES heap as well could help, as well as faster 
storage - SSDs work great with ES and at some point you may need to have 
several nodes

I believe reducing date precision would decrease number of unique terms in 
the index and may help with hystogram. Say, if your histogram precision 
needs date only and not time I would not even index time part (note you may 
use multifield mapping if you need both precise and date rounded timestamp) 


On Sunday, December 7, 2014 5:14:17 AM UTC-5, [email protected] wrote:
>
> How many docs do you expect your histogram will aggregate? Most of your 
>> 111M? If so with just one shard and one thread doing the work it is bound 
>> to be pretty slow. 
>>
>
> Expected aggregated records are 78mio. After reindexing with 6 shards per 
> index the query time reduced by ~50%. The result was surprising: someone 
> wrote several shards on a single disk have less effect, because they share 
> the same i/o. But I should mention the threading effect. Are there 
> recommendations about shard size vs shard count?
>  
>
>> Also have you tried moving your not missing filter out of the agg into 
>> the query filter and also just using > 0 instead of not missing. Also 
>> reducing precision of the timestamp could possible help
>
>
> Removing the missing filter out of the query gives more speed. I cannot 
> remember why I used this missing filter. In current test setup the target 
> result set is identical, even if using 'missing filter'. Is there need to 
> use 'missing filter' here? What happens, if field 'duration' is missing or 
> null in some records?
>
> What is your recommendation to timestamp? Should I replace 
>
> 2014-01-15T14:17:06.245+01:00
>
> with less accuracy in minutes
>
> 2014-01-15T14:17:00.000+01:00
>
> ? Would this affect the field data cache?
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/49ea1396-70bb-4b3f-a5f5-764d53445f79%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to