We have following use case:
We receive large amount of data every day. Most queries (99% of the times) run 
on that day's data or that week's data. Queries on data older than a week are 
rare, but can happen.

Size of one week's data is under few tera bytes, however total data over time 
would be in peta bytes. To reduce total cost, we want to avoid storing all data 
in local storage and also avoid a Druid cluster with hundreds of nodes. We 
would instead like to keep only relevant data in local storage and move rest of 
data to cheaper deep storage like S3. Load data from deep storage on demand, 
only when a time series query request such data. We think this will help us to 
run/manage a smaller druid cluster on-prem and use much smaller local storage 
compare to deep storage.

>From our testing and reading material on Druid, looks like it is not possible 
>to do this today. Please correct me If am wrong. 
Also, would a feature like this fit in Druid product roadmap? Are there any 
pitfalls or reasons which will make this a bad idea? Was this considered 
earlier, spec'ed out but dropped for any reason? If it is merely DEV efforts, 
we won't mind to do the work. 
Highly appreciate comments from Druid development community. Thanks!

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org

Reply via email to