imply-cheddar commented on code in PR #13799:
URL: https://github.com/apache/druid/pull/13799#discussion_r1125926722
##########
processing/src/main/java/org/apache/druid/segment/UnnestColumnValueSelectorCursor.java:
##########
@@ -86,7 +88,11 @@ public UnnestColumnValueSelectorCursor(
this.index = 0;
this.outputName = outputColumnName;
this.needInitialization = true;
- this.allowSet = allowSet;
+ if (filter != null) {
+ this.valueMatcher = filter.makeMatcher(getColumnSelectorFactory());
Review Comment:
Given that this is called in the constructor and then also returned as a
public method, maybe switch it around to have the constructor create the
`ColumnSelectorFactory`, store the reference and then use that here/return it
from the `getColumnSelectorFactory` method.
##########
processing/src/main/java/org/apache/druid/segment/UnnestDimensionCursor.java:
##########
@@ -79,31 +81,32 @@ public class UnnestDimensionCursor implements Cursor
private final DimensionSelector dimSelector;
private final String columnName;
private final String outputName;
- private final LinkedHashSet<String> allowSet;
- private final BitSet allowedBitSet;
private final ColumnSelectorFactory baseColumnSelectorFactory;
- private int index;
- @Nullable private IndexedInts indexedIntsForCurrentRow;
+ @Nullable
+ private final Filter allowFilter;
+ private int indexForRow;
+ @Nullable
+ private IndexedInts indexedIntsForCurrentRow;
private boolean needInitialization;
private SingleIndexInts indexIntsForRow;
+ private ValueMatcher valueMatcher;
public UnnestDimensionCursor(
Cursor cursor,
ColumnSelectorFactory baseColumnSelectorFactory,
String columnName,
String outputColumnName,
- LinkedHashSet<String> allowSet
+ @Nullable Filter allowFilter
)
{
this.baseCursor = cursor;
this.baseColumnSelectorFactory = baseColumnSelectorFactory;
this.dimSelector =
this.baseColumnSelectorFactory.makeDimensionSelector(DefaultDimensionSpec.of(columnName));
this.columnName = columnName;
- this.index = 0;
+ this.indexForRow = 0;
this.outputName = outputColumnName;
this.needInitialization = true;
- this.allowSet = allowSet;
- this.allowedBitSet = new BitSet();
+ this.allowFilter = allowFilter;
Review Comment:
Paul's comment is asking you to re-evaluate if you need to store the
reference to the Filter. In the ColumnValue version of the cursor, you do
*not* store the reference to the Filter and instead generate a `ValueMatcher`
that matches everything. But this code is keeping the reference, it would
appear so that it can do an `instanceof` check.
I think the answer to Paul's comment is that you still need to do the logic
to add the BitSet back to this implementation and, once you do that, you won't
keep the reference to the `Filter` object and things will clean up a bit.
##########
processing/src/test/java/org/apache/druid/segment/UnnestStorageAdapterTest.java:
##########
@@ -249,150 +242,43 @@ public void test_two_levels_of_unnest_adapters()
unnest 2 rows -> 16 entries also the value cardinality
unnest of unnest -> 16*8 = 128 rows
*/
- Assert.assertEquals(count, 128);
- Assert.assertEquals(dimSelector.getValueCardinality(), 16);
+ Assert.assertEquals(128, count);
+ Assert.assertEquals(16, dimSelector.getValueCardinality());
return null;
});
}
@Test
- public void test_unnest_adapters_with_allowList()
+ public void test_unnest_adapters_basic_with_in_filter()
Review Comment:
If there is a test suite for "normal" cursor stuff, it would be great to be
able to plug this into one of those. Those should all be able to pass without
any errors as long as they aren't referencing the actual unnested column.
I haven't looked to see how easy it will be to reuse that test suite, but
it's worth it to explore what can be done to be able to reuse it.
##########
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java:
##########
@@ -215,6 +254,26 @@ public DruidQuery toDruidQuery(boolean
finalizeAggregations)
);
}
+ private PartialDruidQuery updateCorrPartialQueryFromLeft(PartialDruidQuery
partialQueryFromLeft)
+ {
+ // The DruidCorrelateRule already creates the project and pushes it on the
top level
+ // So get select project from partialQuery
+ // The filters are present on the partial query of the left
+ // The group by and having clauses would be on the top level
+ // Same for the sort
+ PartialDruidQuery corrQuery = PartialDruidQuery.create(correlateRel);
+ corrQuery =
corrQuery.withWhereFilter(partialQueryFromLeft.getWhereFilter())
+ .withSelectProject(partialQuery.getSelectProject());
+ if (partialQuery.getAggregate() != null) {
+ corrQuery = corrQuery.withAggregate(partialQuery.getAggregate())
+ .withHavingFilter(partialQuery.getHavingFilter());
+ }
+ if (partialQuery.getSort() != null || partialQuery.getSortProject() !=
null) {
+ corrQuery = corrQuery.withSort(partialQuery.getSort());
+ }
+ return corrQuery;
+ }
+
Review Comment:
This feels like we are throwing away the left-hand side and instead trying
to build a query from the unnest part? Taht seems wrong to me, we should be
using the unnest on the RHS to augment the query from the LHS (attach to the
datasource). the LHS is the actual working query, the RHS is just the addition
of the unnest portion.
##########
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java:
##########
@@ -136,13 +140,22 @@ public DruidQuery toDruidQuery(boolean
finalizeAggregations)
final DruidRel<?> druidQueryRel = (DruidRel<?>) left;
final DruidQuery leftQuery =
Preconditions.checkNotNull((druidQueryRel).toDruidQuery(false), "leftQuery");
final DataSource leftDataSource;
+ final PartialDruidQuery partialQueryFromLeft =
druidQueryRel.getPartialDruidQuery();
+ final PartialDruidQuery corrPartialQuey;
+
+ // If there is a LIMIT in the left query
+ // It should be honored before unnest
+ // Create a query data source in that case
Review Comment:
What kind of query is that? Do we have a test covering it? I'm not sure I
believe this comment is true, but I also am not able to imagine a query that
can do this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]