gianm commented on issue #6105: Allow sorting segments on some dims before time
URL: 
https://github.com/apache/incubator-druid/issues/6105#issuecomment-414080582
 
 
   Fwiw, some situations where ability to sort by other-than-time would be 
expected to be useful:
   
   1. Timeseries data (ironically). Timeseries data is usually modeled as 
"series" of "points" where each series has a "metric" (like its name) and 
"tags" that can be used to differentiate it from other series with the same 
"metric". In Druid you'd model this by making each point into a row, and making 
the metric and tags into dimensions. It's best to store these rows sorted by 
metric, so the rows for a particular metric compress better (since they are 
likely to have a lot of the same tag values), and so we can retrieve all the 
points for a series faster (since they will have better locality of storage).
   2. Clickstream data when you want to do session analyses on it. The idea 
would be to partition by day first, then by session id (we do already support 
this: segmentGranularity DAY, and single-dimension partitioning via Hadoop 
indexing). Then within a segment, sort by session id. It makes it possible to 
do queries like "count the number of sessions where X, then Y, then Z happened" 
in linear time and constant memory.
   3. Multi-tenant datasets, where you store data for different tenants in the 
same Druid dataSource. In this case you'd want to both partition and sort by 
tenant_id. It should improve both compression ratio and query time.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to