[GitHub] [druid] imply-cheddar commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

via GitHub Sun, 05 Mar 2023 23:08:59 -0800


imply-cheddar commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1125926722



##########
processing/src/main/java/org/apache/druid/segment/UnnestColumnValueSelectorCursor.java:
##########
@@ -86,7 +88,11 @@ public UnnestColumnValueSelectorCursor(
     this.index = 0;
     this.outputName = outputColumnName;
     this.needInitialization = true;
-    this.allowSet = allowSet;
+    if (filter != null) {
+      this.valueMatcher = filter.makeMatcher(getColumnSelectorFactory());

Review Comment:
   Given that this is called in the constructor and then also returned as a 
public method, maybe switch it around to have the constructor create the 
`ColumnSelectorFactory`, store the reference and then use that here/return it 
from the `getColumnSelectorFactory` method.



##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -79,31 +81,32 @@ public class UnnestDimensionCursor implements Cursor
   private final DimensionSelector dimSelector;
   private final String columnName;
   private final String outputName;
-  private final LinkedHashSet<String> allowSet;
-  private final BitSet allowedBitSet;
   private final ColumnSelectorFactory baseColumnSelectorFactory;
-  private int index;
-  @Nullable private IndexedInts indexedIntsForCurrentRow;
+  @Nullable
+  private final Filter allowFilter;
+  private int indexForRow;
+  @Nullable
+  private IndexedInts indexedIntsForCurrentRow;
   private boolean needInitialization;
   private SingleIndexInts indexIntsForRow;
+  private ValueMatcher valueMatcher;
 
   public UnnestDimensionCursor(
       Cursor cursor,
       ColumnSelectorFactory baseColumnSelectorFactory,
       String columnName,
       String outputColumnName,
-      LinkedHashSet<String> allowSet
+      @Nullable Filter allowFilter
   )
   {
     this.baseCursor = cursor;
     this.baseColumnSelectorFactory = baseColumnSelectorFactory;
     this.dimSelector = 
this.baseColumnSelectorFactory.makeDimensionSelector(DefaultDimensionSpec.of(columnName));
     this.columnName = columnName;
-    this.index = 0;
+    this.indexForRow = 0;
     this.outputName = outputColumnName;
     this.needInitialization = true;
-    this.allowSet = allowSet;
-    this.allowedBitSet = new BitSet();
+    this.allowFilter = allowFilter;

Review Comment:
   Paul's comment is asking you to re-evaluate if you need to store the 
reference to the Filter.  In the ColumnValue version of the cursor, you do 
*not* store the reference to the Filter and instead generate a `ValueMatcher` 
that matches everything.  But this code is keeping the reference, it would 
appear so that it can do an `instanceof` check.
   
   I think the answer to Paul's comment is that you still need to do the logic 
to add the BitSet back to this implementation and, once you do that, you won't 
keep the reference to the `Filter` object and things will clean up a bit.



##########
processing/src/test/java/org/apache/druid/segment/UnnestStorageAdapterTest.java:
##########
@@ -249,150 +242,43 @@ public void test_two_levels_of_unnest_adapters()
       unnest 2 rows -> 16 entries also the value cardinality
       unnest of unnest -> 16*8 = 128 rows
        */
-      Assert.assertEquals(count, 128);
-      Assert.assertEquals(dimSelector.getValueCardinality(), 16);
+      Assert.assertEquals(128, count);
+      Assert.assertEquals(16, dimSelector.getValueCardinality());
       return null;
     });
   }
 
   @Test
-  public void test_unnest_adapters_with_allowList()
+  public void test_unnest_adapters_basic_with_in_filter()

Review Comment:
   If there is a test suite for "normal" cursor stuff, it would be great to be 
able to plug this into one of those.  Those should all be able to pass without 
any errors as long as they aren't referencing the actual unnested column.
   
   I haven't looked to see how easy it will be to reuse that test suite, but 
it's worth it to explore what can be done to be able to reuse it.



##########
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java:
##########
@@ -215,6 +254,26 @@ public DruidQuery toDruidQuery(boolean 
finalizeAggregations)
     );
   }
 
+  private PartialDruidQuery updateCorrPartialQueryFromLeft(PartialDruidQuery 
partialQueryFromLeft)
+  {
+    // The DruidCorrelateRule already creates the project and pushes it on the 
top level
+    // So get select project from partialQuery
+    // The filters are present on the partial query of the left
+    // The group by and having clauses would be on the top level
+    // Same for the sort
+    PartialDruidQuery corrQuery = PartialDruidQuery.create(correlateRel);
+    corrQuery = 
corrQuery.withWhereFilter(partialQueryFromLeft.getWhereFilter())
+                         .withSelectProject(partialQuery.getSelectProject());
+    if (partialQuery.getAggregate() != null) {
+      corrQuery = corrQuery.withAggregate(partialQuery.getAggregate())
+                           .withHavingFilter(partialQuery.getHavingFilter());
+    }
+    if (partialQuery.getSort() != null || partialQuery.getSortProject() != 
null) {
+      corrQuery = corrQuery.withSort(partialQuery.getSort());
+    }
+    return corrQuery;
+  }
+

Review Comment:
   This feels like we are throwing away the left-hand side and instead trying 
to build a query from the unnest part?  Taht seems wrong to me, we should be 
using the unnest on the RHS to augment the query from the LHS (attach to the 
datasource).  the LHS is the actual working query, the RHS is just the addition 
of the unnest portion.



##########
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java:
##########
@@ -136,13 +140,22 @@ public DruidQuery toDruidQuery(boolean 
finalizeAggregations)
     final DruidRel<?> druidQueryRel = (DruidRel<?>) left;
     final DruidQuery leftQuery = 
Preconditions.checkNotNull((druidQueryRel).toDruidQuery(false), "leftQuery");
     final DataSource leftDataSource;
+    final PartialDruidQuery partialQueryFromLeft = 
druidQueryRel.getPartialDruidQuery();
+    final PartialDruidQuery corrPartialQuey;
+
+    // If there is a LIMIT in the left query
+    // It should be honored before unnest
+    // Create a query data source in that case

Review Comment:
   What kind of query is that?  Do we have a test covering it?  I'm not sure I 
believe this comment is true, but I also am not able to imagine a query that 
can do this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] imply-cheddar commented on a diff in pull request #13799: Now unnest allows bound, in and selector filters on the unnested column

Reply via email to