clintropolis commented on code in PR #18215:
URL: https://github.com/apache/druid/pull/18215#discussion_r2193628049


##########
processing/src/test/java/org/apache/druid/segment/CursorFactoryProjectionTest.java:
##########
@@ -1102,6 +1102,61 @@ public void 
testQueryGranularityFitsProjectionGranularity()
     }
   }
 
+  @Test
+  public void testQueryGranularityLargerProjectionGranularity()
+  {
+    final GroupByQuery.Builder queryBuilder =
+        GroupByQuery.builder()
+                    .setDataSource("test")
+                    .setInterval(Intervals.ETERNITY)
+                    .addAggregator(new LongSumAggregatorFactory("c_sum", "c"));
+    if (sortByDim) {

Review Comment:
   >i thought this meant query has a sortBy at first, but i guess here's just 
we want to test out time as granularity & virtual column? might worth some 
comments.
   
   yea, `sortByDim` is whether or not the segment we are querying is stored 
sorted first by `__time` or not, its one of the test parameters which controls 
if we set the `forceSegmentSortByTime` parameter on `DImensionsSpec` added in 
#16849. When this is set, using a granularity on a query doesn't work because 
the assumptions the engines make about the segments when using granularity 
(which acts like an implicit group by time_floor(__time, granularity)) are that 
the rows of the cursor are ordered by __time and so it can just scan and 
collect the time bucket group of the granularity as it goes instead of grouping 
on it properly as if it were a dimension. So for group by, we have to 
explicitly make a time_floor and group on it instead of using the query 
granularity. Timeseries queries don't work either if the segment isn't first 
sorted by __time, so some of the tests also just skip if the segment isn't time 
sorted. I can rename this parameter to `segmentSortedByTime` or something and it
  might be clearer?
   
   >also, some projections are named as daily, e.x. abfoo_daily but it's not 
grouping by DAY right?
   ah I guess I could rename them to leave the granularity off of the implicit 
ones, some of those are projections without a __time equivalent column, but 
they still live within the constraints of the segment granularity, so its an 
implicit time_floor of day in this case since that is when the segment interval 
starts.



##########
processing/src/test/java/org/apache/druid/segment/CursorFactoryProjectionTest.java:
##########
@@ -1102,6 +1102,61 @@ public void 
testQueryGranularityFitsProjectionGranularity()
     }
   }
 
+  @Test
+  public void testQueryGranularityLargerProjectionGranularity()
+  {
+    final GroupByQuery.Builder queryBuilder =
+        GroupByQuery.builder()
+                    .setDataSource("test")
+                    .setInterval(Intervals.ETERNITY)
+                    .addAggregator(new LongSumAggregatorFactory("c_sum", "c"));
+    if (sortByDim) {

Review Comment:
   >i thought this meant query has a sortBy at first, but i guess here's just 
we want to test out time as granularity & virtual column? might worth some 
comments.
   
   yea, `sortByDim` is whether or not the segment we are querying is stored 
sorted first by `__time` or not, its one of the test parameters which controls 
if we set the `forceSegmentSortByTime` parameter on `DImensionsSpec` added in 
#16849. When this is set, using a granularity on a query doesn't work because 
the assumptions the engines make about the segments when using granularity 
(which acts like an implicit group by time_floor(__time, granularity)) are that 
the rows of the cursor are ordered by __time and so it can just scan and 
collect the time bucket group of the granularity as it goes instead of grouping 
on it properly as if it were a dimension. So for group by, we have to 
explicitly make a time_floor and group on it instead of using the query 
granularity. Timeseries queries don't work either if the segment isn't first 
sorted by __time, so some of the tests also just skip if the segment isn't time 
sorted. I can rename this parameter to `segmentSortedByTime` or something and it
  might be clearer?
   
   >also, some projections are named as daily, e.x. abfoo_daily but it's not 
grouping by DAY right?
   
   ah I guess I could rename them to leave the granularity off of the implicit 
ones, some of those are projections without a __time equivalent column, but 
they still live within the constraints of the segment granularity, so its an 
implicit time_floor of day in this case since that is when the segment interval 
starts.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to