To clarify, these questions are coming from my desire to dynamically 
produce real time aggregated information from a "stream", which in this 
case is metric data we're feeding to ES.  I'm concerned about unnecessary 
re-execution of aggregations on (potentially large) data sets that could be 
computed more efficiently by maintaining buckets that are simply updated as 
data enters ES.  I'm not sure if there is a good pattern for this or if I'm 
better off using a different technology entirely (e.g. Storm, etc), though 
it is nice having all my logs/metrics queryable from one place.


On Thursday, June 5, 2014 12:51:52 PM UTC-4, erewh0n wrote:
>
> I've recently started using and enjoying ES, in particular I'm keen to 
> exploit the new aggregations feature to report on system metrics data that 
> is currently being fed into ES indexes.
>
> I'm experimenting with aggregations that fold up things like request rates 
> per machine or API calls (per machine, globally, etc).  I was thinking that 
> it might be useful to store the aggregation result itself, particularly if 
> I set a (let's say) weekly TTL on the incoming metrics data but would like 
> to preserve historical aggregates (e.g. find me the average/min/max request 
> rate on day 17).  I might want to keep the raw metrics for a week, but the 
> aggregates should potentially stick around for years.
>
> Are there any recommendations/patterns in regards to dealing with these 
> scenarios?  Are there existing means for recomputing aggregates at regular 
> intervals and emitting those back into ES?
>
> Thanks!
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/61b81fce-373b-4a59-bcfa-d5cacb7b5744%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to