Hi,
    We are currently utilizing ES for analyzing the last 24 hours of data.
The arrival rate of data is of the order of a few hundreds per 10 second
interval with each document having a timestamp associated with it.
    We now need to be able to analyze data over a week and to reduce the
amount of space required we plan to retain the 24 hr TTL on the documents
but aggregate data into one document per minute to retrieve data that is
dated beyond 24 hours and upto 7 days. All fields in the document need to
be aggregated.

    So,
1. Is there any out of the box features that I can use to achieve this kind
of roll ups?
2. What is the best approach (preferably a time-tested one if someone has
already done this)?

    Some approaches we were contemplating:
1. Aggregating the data in real time (outside ES) and store the aggregated
data into ES
2. Periodically (say once in 30 mins) run aggregation queries and write
back responses to ES
3. Periodically (say once in 30 mins) read new documents using time range,
aggregate and store back aggregated data in bulk into ES. Maybe use
streaming or paged read of documents to aggregate them....
4. Maybe use a combination of 1 and (2 or 3) so that real time data gets
aggregated and data that is delayed (may happen) due to some reason can be
updated into the aggregated data using the Update API of ES?

Thanks for all the advices,
Srinath.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GJa_1Qeko20C%3DSaYdWYOJt1EmW-oq8Nj931by4Ab3CDkA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to