Aggregations-only use case - performance tuning via config possible?

Ben Mon, 28 Jul 2014 02:07:25 -0700

Hi there,

I am using ES for calculating aggregations on a dataset of sales data 
(about 50,000,000 docs or 10GB of data). As an example, I am using the date 
histogram aggregation with term / sum sub-aggregations to get the sales sum 
per day and product. There is a product_id, a date field, and a quantity 
field among others.


This use case has no live indexing (!). I bulk-index the new sales data 
once a day, shortly after midnight for the previous day only - during the 
rest of the day, no new data is added. I also do not use any result sets 
other than the aggregations results, so my result size is always set to 0 
(zero) in queries.

My machine has 128GB Ram (about 75GB reserved to ES via ES_MIN_MEM / 
ES_MAX_MEM) and 12 cores, and SSD disks.
I am using a config of 1 shard and 0 replica (no cluster - this is a 
single, isolated machine).

My aim is to make the aggregation calculations perform as fast as possible. 
Are there any recommendations for config setting for ES or the Indexes?

Another questions is if there is a way to silence the bulk indexing logs (I 
am using Jörg Prante's JDBC plugin) to zero output? I was unable to find 
the right setting to do that.

Thank you!
Ben

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a327d680-8917-41f2-83e3-ad013c94788a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Aggregations-only use case - performance tuning via config possible?

Reply via email to