clintropolis opened a new pull request, #19460:
URL: https://github.com/apache/druid/pull/19460

   ### Description
   This PR builds on the foundations laid by projections (#17214) and the v10 
segment format (#18880) to introduce 'clustered' segments, to give operators 
the option to push a `CLUSTERED BY` clause _inside of a segment_, as a 
companion to partitioning data is distributed between segments in this manner. 
Internally, the 'base' table is decmposed into separate cluster groups, which 
are combined together to form the 'complete' view of all rows stored in the 
segment via concatenation. This optimizes for use cases where the typical 
queries are filtering down to a small subset of the cluster groups (ideally a 
single grouping), which like the effect from using aggregate projections, can 
greatly reduce the number of rows to be scanned. The expected use cases are 
things like multi-tenant-with-shared datasource clusters, metrics use cases 
which typically filter to a single type of service, etc
   
   This PR contains only the read side, to get feedback on the internal segment 
metadata shapes and query engine integration. The write side (ingestion 
support) will come in a follow-up PR, so this PR uses some test fixtures to 
exercise the read paths until segment building is actually in place.
   
   Since this is an experimental/mostly new feature, the most important part 
for reviewers is the new internal segment metadata, 
`ClusteredValueGroupsBaseTableSchema` and its internal new stuff like 
`TableClusterGroupSpec` and `ClusteringDictionaries`, so that we can ensure the 
metadata we will be storing in the segment is "good" since changing it after 
segments have been written is very hard.
   
   _todo: elaborate on design_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to