clintropolis commented on code in PR #18215:
URL: https://github.com/apache/druid/pull/18215#discussion_r2193628049
##########
processing/src/test/java/org/apache/druid/segment/CursorFactoryProjectionTest.java:
##########
@@ -1102,6 +1102,61 @@ public void
testQueryGranularityFitsProjectionGranularity()
}
}
+ @Test
+ public void testQueryGranularityLargerProjectionGranularity()
+ {
+ final GroupByQuery.Builder queryBuilder =
+ GroupByQuery.builder()
+ .setDataSource("test")
+ .setInterval(Intervals.ETERNITY)
+ .addAggregator(new LongSumAggregatorFactory("c_sum", "c"));
+ if (sortByDim) {
Review Comment:
>i thought this meant query has a sortBy at first, but i guess here's just
we want to test out time as granularity & virtual column? might worth some
comments.
yea, `sortByDim` is whether or not the segment we are querying is stored
sorted first by `__time` or not, its one of the test parameters which controls
if we set the `forceSegmentSortByTime` parameter on `DImensionsSpec` added in
#16849. When this is set, using a granularity on a query doesn't work because
the assumptions the engines make about the segments when using granularity
(which acts like an implicit group by time_floor(__time, granularity)) are that
the rows of the cursor are ordered by __time and so it can just scan and
collect the time bucket group of the granularity as it goes instead of grouping
on it properly as if it were a dimension. So for group by, we have to
explicitly make a time_floor and group on it instead of using the query
granularity. Timeseries queries don't work either if the segment isn't first
sorted by __time, so some of the tests also just skip if the segment isn't time
sorted. I can rename this parameter to `segmentSortedByTime` or something and it
might be clearer?
>also, some projections are named as daily, e.x. abfoo_daily but it's not
grouping by DAY right?
ah I guess I could rename them to leave the granularity off of the implicit
ones, some of those are projections without a __time equivalent column, but
they still live within the constraints of the segment granularity, so its an
implicit time_floor of day in this case since that is when the segment interval
starts.
##########
processing/src/test/java/org/apache/druid/segment/CursorFactoryProjectionTest.java:
##########
@@ -1102,6 +1102,61 @@ public void
testQueryGranularityFitsProjectionGranularity()
}
}
+ @Test
+ public void testQueryGranularityLargerProjectionGranularity()
+ {
+ final GroupByQuery.Builder queryBuilder =
+ GroupByQuery.builder()
+ .setDataSource("test")
+ .setInterval(Intervals.ETERNITY)
+ .addAggregator(new LongSumAggregatorFactory("c_sum", "c"));
+ if (sortByDim) {
Review Comment:
>i thought this meant query has a sortBy at first, but i guess here's just
we want to test out time as granularity & virtual column? might worth some
comments.
yea, `sortByDim` is whether or not the segment we are querying is stored
sorted first by `__time` or not, its one of the test parameters which controls
if we set the `forceSegmentSortByTime` parameter on `DImensionsSpec` added in
#16849. When this is set, using a granularity on a query doesn't work because
the assumptions the engines make about the segments when using granularity
(which acts like an implicit group by time_floor(__time, granularity)) are that
the rows of the cursor are ordered by __time and so it can just scan and
collect the time bucket group of the granularity as it goes instead of grouping
on it properly as if it were a dimension. So for group by, we have to
explicitly make a time_floor and group on it instead of using the query
granularity. Timeseries queries don't work either if the segment isn't first
sorted by __time, so some of the tests also just skip if the segment isn't time
sorted. I can rename this parameter to `segmentSortedByTime` or something and it
might be clearer?
>also, some projections are named as daily, e.x. abfoo_daily but it's not
grouping by DAY right?
ah I guess I could rename them to leave the granularity off of the implicit
ones, some of those are projections without a __time equivalent column, but
they still live within the constraints of the segment granularity, so its an
implicit time_floor of day in this case since that is when the segment interval
starts.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]