Missing filter is fairly costly. I do not believe you need it as > 0 should take care of excluding nulls
one thread can act on one shard at the same time so the only way you can parallelize you query is by splitting it onto more shards to let multiple threads do parallel work on smaller sized shards. So if your server has say 16 cores you may consider roughly the same number of shards (maybe a bit fewer) If it is IO bound rather than CPU bound, more memory for OS level caching and probably bumping up ES heap as well could help, as well as faster storage - SSDs work great with ES and at some point you may need to have several nodes I believe reducing date precision would decrease number of unique terms in the index and may help with hystogram. Say, if your histogram precision needs date only and not time I would not even index time part (note you may use multifield mapping if you need both precise and date rounded timestamp) On Sunday, December 7, 2014 5:14:17 AM UTC-5, [email protected] wrote: > > How many docs do you expect your histogram will aggregate? Most of your >> 111M? If so with just one shard and one thread doing the work it is bound >> to be pretty slow. >> > > Expected aggregated records are 78mio. After reindexing with 6 shards per > index the query time reduced by ~50%. The result was surprising: someone > wrote several shards on a single disk have less effect, because they share > the same i/o. But I should mention the threading effect. Are there > recommendations about shard size vs shard count? > > >> Also have you tried moving your not missing filter out of the agg into >> the query filter and also just using > 0 instead of not missing. Also >> reducing precision of the timestamp could possible help > > > Removing the missing filter out of the query gives more speed. I cannot > remember why I used this missing filter. In current test setup the target > result set is identical, even if using 'missing filter'. Is there need to > use 'missing filter' here? What happens, if field 'duration' is missing or > null in some records? > > What is your recommendation to timestamp? Should I replace > > 2014-01-15T14:17:06.245+01:00 > > with less accuracy in minutes > > 2014-01-15T14:17:00.000+01:00 > > ? Would this affect the field data cache? > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/49ea1396-70bb-4b3f-a5f5-764d53445f79%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
