clintropolis commented on code in PR #18262:
URL: https://github.com/apache/druid/pull/18262#discussion_r2221488866
##########
processing/src/main/java/org/apache/druid/data/input/impl/AggregateProjectionSpec.java:
##########
@@ -175,8 +174,8 @@ public String toString()
private static ProjectionOrdering computeOrdering(VirtualColumns
virtualColumns, List<DimensionSchema> groupingColumns)
{
if (groupingColumns.isEmpty()) {
- // call it time ordered, there is no grouping columns so there is only 1
row for this projection
- return new ProjectionOrdering(Cursors.ascendingTimeOrder(), null);
+ // no ordering since there is only 1 row for this projection
+ return new ProjectionOrdering(List.of(), null);
Review Comment:
>ah do you mean that if a query is bucket at DAY(__time), and segment is
bucket at HOUR(__time), then the OrderBy in projection doesn't matter since
it'd be grouped anyway?
Ish, yea, the projection transformation of the __time column must be finer
than or equal to the query transformation of the time_column for the projection
to be able to compute the same query results, so a time_floor(__time, 'P1D')
would group all of the hourly segment times into the same day (even if the
projection has no __time transformation because all of the rows would still
have a timestamp of the beginning of the hour or whatever).
>I'm thinking that projection should inherit the granularity from segment,
which it seems like it doesn't have that info right now?
A projection doesn't have an explicit granularity, which was a deliberate
design decision. It either has a precomputed virtual column with a time_floor
expression on __time (or even just __time directly from the base table) which
is stored as a physical column in the segment, or it doesn't. The latter case
as far as the engines are concerned still has the __time column due to the
constraints of the segment interval, it just isn't a physical column and
instead is a constant of min timestamp. The projection schema stored in the
metadata defines which physical columns are present in the segment for a given
projection, so a projection itself shouldn't really inherit a segment
granularity directly if that is what you mean. Instead, the match code should
consider the interval of the segment to determine if the projection __time
transformation matter or not (probably updating `PhysicalColumnChecker` to add
a method to get the data interval? though i havent thought too hard about it
yet)
In general, we have been downplaying the concept of granularity since it is
basically an oddly specific way to define a grouping on time for a query, and
the query engine implementations of it only work when a cursor is time ordered
(we have some plans to transition this to virtual columns too but haven't got
around to it yet).
> it should also work with DESC __time too if the query bucket is larger?
the descending thing isn't really directly related to __time matching, and
would handled in a different place in code, i think by just dropping the
direction from consideration. A query needing a descending order on __time
should work if the base table or a projection is ordered by __time in either
direction since the cursor can be done ascending or descending order (vector
cursor is not yet, but that isn't this codes concern really). The other
constraints of projection __time matching must also be true (such as finer than
or equal to __time transformation), but the code that is done via the virtual
column comparison because the ordering isn't part of the time_floor expression
itself. While we don't have any implementations of it yet, the same would also
be true if we had engines that could take advantage of ordering on things other
than __time, if we need `order by x desc` then as long as the table is `order
by x` in any direction we could use it for the query
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]