gianm commented on code in PR #16533:
URL: https://github.com/apache/druid/pull/16533#discussion_r1704293481


##########
processing/src/main/java/org/apache/druid/segment/CursorBuildSpec.java:
##########
@@ -0,0 +1,241 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment;
+
+import org.apache.druid.java.util.common.Intervals;
+import org.apache.druid.java.util.common.granularity.Granularities;
+import org.apache.druid.java.util.common.granularity.Granularity;
+import org.apache.druid.query.QueryContext;
+import org.apache.druid.query.QueryMetrics;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.filter.Filter;
+import org.joda.time.Interval;
+
+import javax.annotation.Nullable;
+import java.util.List;
+
+public class CursorBuildSpec
+{
+  public static final CursorBuildSpec FULL_SCAN = 
CursorBuildSpec.builder().setGranularity(Granularities.ALL).build();
+
+  public static CursorBuildSpecBuilder builder()
+  {
+    return new CursorBuildSpecBuilder();
+  }
+
+  public static CursorBuildSpecBuilder builder(CursorBuildSpec spec)
+  {
+    return new CursorBuildSpecBuilder(spec);
+  }
+
+  @Nullable
+  private final Filter filter;
+  private final Interval interval;
+  private final Granularity granularity;
+  @Nullable
+  private final List<String> groupingColumns;
+  private final VirtualColumns virtualColumns;
+  @Nullable
+  private final List<AggregatorFactory> aggregators;
+
+  private final QueryContext queryContext;
+
+  private final boolean descending;
+  @Nullable
+  private final QueryMetrics<?> queryMetrics;
+
+  public CursorBuildSpec(
+      @Nullable Filter filter,
+      Interval interval,
+      Granularity granularity,
+      @Nullable List<String> groupingColumns,
+      VirtualColumns virtualColumns,
+      @Nullable List<AggregatorFactory> aggregators,
+      QueryContext queryContext,
+      boolean descending,
+      @Nullable QueryMetrics<?> queryMetrics
+  )
+  {
+    this.filter = filter;
+    this.interval = interval;
+    this.granularity = granularity;
+    this.groupingColumns = groupingColumns;
+    this.virtualColumns = virtualColumns;
+    this.aggregators = aggregators;
+    this.descending = descending;
+    this.queryContext = queryContext;
+    this.queryMetrics = queryMetrics;
+  }
+
+  @Nullable
+  public Filter getFilter()
+  {
+    return filter;
+  }
+
+  public Interval getInterval()
+  {
+    return interval;
+  }
+
+  public Granularity getGranularity()
+  {
+    return granularity;
+  }
+
+  @Nullable
+  public List<String> getGroupingColumns()
+  {
+    return groupingColumns;
+  }
+
+  public VirtualColumns getVirtualColumns()
+  {
+    return virtualColumns;
+  }
+
+  @Nullable
+  public List<AggregatorFactory> getAggregators()
+  {
+    return aggregators;
+  }
+
+  public boolean isDescending()

Review Comment:
   I was thinking the use case is that someone wants to do a bunch of `GROUP BY 
sourceIp` and `GROUP BY destinationIp` queries, and to speed those up we'd want 
to store two projections, one sorted by `sourceIp` and one by `destinationIp`. 
The grouping engine could do a streaming aggregation, without any temporary 
hash table, because the rows are already in the right sorted order. We don't 
have the code yet in `groupBy` to do this but I was thinking we would add it.
   
   To support that, the query engines would need some way of passing a 
preferred ordering to the cursor maker. The cursor I think would also need a 
way of telling the query engine what its sort order actually is.
   
   > Specifically, I'm thinking mostly about topN and groupBy, which never set 
descending and do their own sorting. Group by has an additional problem, which 
if we order by time descending, with this new logic it would try to do that to 
the cursor, which would mean it couldn't vectorize, but the engine doesn't 
actually care if the results are ordered by the queries order by or not and 
expects everything to be not descending today.
   
   For this, given the fact that the grouping engines don't take advantage of 
sortedness today, they should pass null or empty for the preferred sort order 
parameter. If they do start caring at some point, that would change then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to