[GitHub] [druid] paul-rogers commented on a diff in pull request #13764: Fixing a data correctness issue in unnest when first row of an MVD is null

via GitHub Wed, 08 Feb 2023 13:23:55 -0800


paul-rogers commented on code in PR #13764:
URL: https://github.com/apache/druid/pull/13764#discussion_r1100676736



##########
sql/src/test/java/org/apache/druid/sql/calcite/util/CalciteTests.java:
##########
@@ -97,6 +97,7 @@
   public static final String DATASOURCE3 = "numfoo";
   public static final String DATASOURCE4 = "foo4";
   public static final String DATASOURCE5 = "lotsocolumns";
+  public static final String DATASOURCE6 = "unnestnumfoo";

Review Comment:
   Better name? `nested` perhaps? Also, would be cool to add a comment with the 
schema: I find it hard to suss that out from the code.



##########
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java:
##########
@@ -200,11 +200,36 @@ public DruidQuery toDruidQuery(boolean 
finalizeAggregations)
     // This is necessary to handle the virtual columns on the unnestProject
     // Also create the unnest datasource to be used by the partial query
     PartialDruidQuery partialDruidQuery = unnestProjectNeeded ? 
partialQuery.withUnnest(unnestProject) : partialQuery;
+    String outputColName = 
unnestDatasourceRel.getUnnestProject().getRowType().getFieldNames().get(0);
+
+    // In case of nested queries for e.g.
+    // with t AS (select * from druid.numfoo, unnest(MV_TO_ARRAY(dim3)) as 
unnested (d3))
+    // select d2,d3 from t, UNNEST(MV_TO_ARRAY(dim2)) as foo(d2)
+    // which plans to
+    // 186:LogicalCorrelate
+    //  178:LogicalProject
+    //    176:LogicalCorrelate
+    //      13:LogicalTableScan(subset=[rel#168:Subset#0.NONE.[]], 
table=[[druid, numfoo]])
+    //      172:Uncollect(subset=[rel#173:Subset#3.NONE.[]])
+    //        170:LogicalProject(subset=[rel#171:Subset#2.NONE.[]], 
EXPR$0=[MV_TO_ARRAY($cor0.dim3)])
+    //          14:LogicalValues(subset=[rel#169:Subset#1.NONE.[0]], 
tuples=[[{ 0 }]])
+    //  182:Uncollect(subset=[rel#183:Subset#8.NONE.[]])
+    //    180:LogicalProject(subset=[rel#181:Subset#7.NONE.[]], 
EXPR$0=[MV_TO_ARRAY($cor1.dim2)])
+    //      14:LogicalValues(subset=[rel#169:Subset#1.NONE.[0]], tuples=[[{ 0 
}]])
+    //
+    // the column name cannot be EXPR$0 for both inner and outer. The inner 
one which gets executed first gets the name
+    // EXPR$0 and as we move up the tree we add a 0 at the end to make the top 
level EXPR$00.
+    // Ideally these names should be replaced by the alias names specified in 
the query. Any future developer if
+    // able to find these alias names should replace EXPR$0 by dim3 and 
EXPR$00 by dim2, i.e use the correct name from Calcite

Review Comment:
   Thanks much for the detailed explanation!



##########
sql/src/test/java/org/apache/druid/sql/calcite/util/TestDataBuilder.java:
##########
@@ -351,6 +351,100 @@ public Optional<Joinable> build(
   public static final List<InputRow> ROWS1 =
       
RAW_ROWS1.stream().map(TestDataBuilder::createRow).collect(Collectors.toList());
 
+  public static final List<ImmutableMap<String, Object>> RAW_ROWS_FOR_UNNEST = 
ImmutableList.of(

Review Comment:
   Does this have all the interesting corner cases? Empty arrays or objects? 
Null values? Fields that appear in one nested object but not another (in both 
orders: (a,b), (a), (a,c))? And so on. To help future readers, might be handy 
to add a comment above each `.put(` call that sets up one of these cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] paul-rogers commented on a diff in pull request #13764: Fixing a data correctness issue in unnest when first row of an MVD is null

Reply via email to