gianm opened a new pull request, #16849:
URL: https://github.com/apache/druid/pull/16849

   Currently, segments are always sorted by __time, followed by the sort order 
provided by the user via dimensionsSpec or CLUSTERED BY. Sorting by __time 
enables efficient execution of queries involving time-ordering or granularity. 
Time-ordering is a simple matter of reading the rows in stored order, and 
granular cursors can be generated in streaming fashion.
   
   However, for various workloads, it's better for storage footprint and query 
performance to sort by arbitrary orders that do not start with __time. With 
this patch, users can sort segments by such orders.
   
   For spec-based ingestion, users add "useExplicitSegmentSortOrder: true" to 
dimensionsSpec. The "dimensions" list determines the sort order. To define a 
sort order that includes "__time", users explicitly include a dimension named 
"__time".
   
   For SQL-based ingestion, users set the context parameter 
"useExplicitSegmentSortOrder: true". The CLUSTERED BY clause is then used as 
the explicit segment sort order.
   
   In both cases, when the new "useExplicitSegmentSortOrder" parameter is false 
(the default), __timeĀ is implicitly prepended to the sort order, as it always 
was prior to this patch.
   
   The new parameter is experimental for two main reasons. First, such segments 
can cause errors when loaded by older servers, due to violating their 
expectations that timestamps are always monotonically increasing. Second, even 
on newer servers, not all queries can run on non-time-sorted segments. Scan 
queries involving time-ordering and any query involving granularity will not 
run. (To partially mitigate this, a currently-undocumented SQL feature 
"sqlUseGranularity" is provided. When set to false the SQL planner avoids using 
"granularity".)
   
   Changes on the write path:
   
   1) DimensionsSpec can now optionally contain a __time dimension, which
      controls the placement of __time in the sort order. If not present,
      __time is considered to be first in the sort order, as it has always
      been.
   
   2) IncrementalIndex and IndexMerger are updated to sort facts more
      flexibly; not always by time first.
   
   3) Metadata (stored in metadata.drd) gains a "sortOrder" field.
   
   4) MSQ can generate range-based shard specs even when not all columns are
      singly-valued strings. It merely stops accepting new clustering key
      fields when it encounters the first one that isn't a singly-valued
      string. This is useful because it enables range shard specs on
      "someDim" to be created for clauses like "CLUSTERED BY someDim, __time".
   
   Changes on the read path:
   
   1) Add StorageAdapter#getSortOrder so query engines can tell how a
      segment is sorted.
   
   2) Update QueryableIndexStorageAdapter, IncrementalIndexStorageAdapter,
      and VectorCursorGranularizer to throw errors when using granularities
      on non-time-ordered segments.
   
   3) Update ScanQueryEngine to throw an error when using the time-ordering
     "order" parameter on non-time-ordered segments.
   
   4) Update TimeBoundaryQueryRunnerFactory to perform a segment scan when
      running on a non-time-ordered segment.
   
   5) Add "sqlUseGranularity" context parameter that causes the SQL planner
      to avoid using granularities other than ALL.
   
   Other changes:
   
   1) Rename DimensionsSpec "hasCustomDimensions" to "hasFixedDimensions"
      and change the meaning subtly: it now returns true if the DimensionsSpec
      represents an unchanging list of dimensions, or false if there is
      some discovery happening. This is what call sites had expected anyway.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to