cecemei commented on code in PR #18078:
URL: https://github.com/apache/druid/pull/18078#discussion_r2129903598
##########
processing/src/main/java/org/apache/druid/segment/virtual/ExpressionSelectors.java:
##########
@@ -132,35 +83,8 @@ public static ColumnValueSelector
makeStringColumnValueSelector(
{
final ColumnValueSelector<ExprEval> baseSelector =
makeExprEvalSelector(columnSelectorFactory, expression);
- return new ColumnValueSelector()
+ return new EvalUnwrappingColumnValueSelector(baseSelector)
Review Comment:
Maybe we can add a `coerceEvalToObjectOrList` boolean to
EvalUnwrappingColumnValueSelector, so it encapsulates the
`coerceEvalToObjectOrList` logic?
I also wonder maybe the flatten is not needed since we already cast single
element array?
##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteNestedDataQueryTest.java:
##########
@@ -7386,4 +7477,173 @@ public void testCountPathWithArraysReturning()
.build()
);
}
+
+ @Test
+ public void testSumPathWithArraysRealtime()
+ {
+ /*
+ "obj":{... "c": [100], ...}
+ "obj":{... "c": ["a", "b"], ...}
+ "obj":{...}
+ "obj":{... "c": {"a": 1}, ...},
+ "obj":{... "c": "hello", ...},
+ "obj":{... "c": 12.3, ...},
+ "obj":{... "c": null, ...},
+ */
+ skipVectorize();
+ testQuery(
+ "SELECT "
+ + "SUM(JSON_VALUE(obj, '$.c')) "
+ + "FROM druid.all_auto_realtime",
+ ImmutableList.of(
+ Druids.newTimeseriesQueryBuilder()
+ .dataSource(DATA_SOURCE_ALL_REALTIME)
+ .intervals(querySegmentSpec(Filtration.eternity()))
+ .granularity(Granularities.ALL)
+ .virtualColumns(new NestedFieldVirtualColumn("obj", "$.c",
"v0", ColumnType.DOUBLE))
+ .aggregators(aggregators(new
DoubleSumAggregatorFactory("a0", "v0")))
+ .context(QUERY_CONTEXT_DEFAULT)
+ .build()
+ ),
+ ImmutableList.of(
+ new Object[]{112.3d}
+ ),
+ RowSignature.builder()
+ .add("EXPR$0", ColumnType.DOUBLE)
+ .build()
+ );
+ }
+
+ @Test
+ public void testCountPathWithArraysRealtime()
+ {
+ /*
+ "obj":{... "c": [100], ...}
+ "obj":{... "c": ["a", "b"], ...}
+ "obj":{...}
+ "obj":{... "c": {"a": 1}, ...},
+ "obj":{... "c": "hello", ...},
+ "obj":{... "c": 12.3, ...},
+ "obj":{... "c": null, ...},
+ */
+ // capturing existing behavior... the count should be 4 if it was counting
all non-null primitive values, but that
+ // would mean that the virtual column would need to plan as ARRAY<STRING>
expected type instead of STRING
+ // ... you might notice there are actually 5 non-null obj.c values,
however json_value only returns primitive
+ // values, so the object row is rightfully skipped
+ skipVectorize();
+ testQuery(
Review Comment:
I guess we're saying single element array can be auto casted to string,
while multiple elements array would return null, this sounds a bit unintuitive.
Maybe we should also put down comments like, this is suboptimal and the purpose
of this test case is just to reflect current behavior but could be improved.
##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteNestedDataQueryTest.java:
##########
@@ -6804,44 +6804,93 @@ public void testUnnestJsonQueryArraysJsonValueSum()
@Test
public void testJsonValueNestedEmptyArray()
Review Comment:
nit: consider add a few values when '$.z.d' array is not empty, the test
should be nested ~empty~ arrayright?
##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteNestedDataQueryTest.java:
##########
@@ -7386,4 +7477,173 @@ public void testCountPathWithArraysReturning()
.build()
);
}
+
+ @Test
+ public void testSumPathWithArraysRealtime()
+ {
+ /*
+ "obj":{... "c": [100], ...}
+ "obj":{... "c": ["a", "b"], ...}
+ "obj":{...}
+ "obj":{... "c": {"a": 1}, ...},
+ "obj":{... "c": "hello", ...},
+ "obj":{... "c": 12.3, ...},
+ "obj":{... "c": null, ...},
+ */
+ skipVectorize();
+ testQuery(
+ "SELECT "
+ + "SUM(JSON_VALUE(obj, '$.c')) "
+ + "FROM druid.all_auto_realtime",
+ ImmutableList.of(
+ Druids.newTimeseriesQueryBuilder()
+ .dataSource(DATA_SOURCE_ALL_REALTIME)
+ .intervals(querySegmentSpec(Filtration.eternity()))
+ .granularity(Granularities.ALL)
+ .virtualColumns(new NestedFieldVirtualColumn("obj", "$.c",
"v0", ColumnType.DOUBLE))
+ .aggregators(aggregators(new
DoubleSumAggregatorFactory("a0", "v0")))
+ .context(QUERY_CONTEXT_DEFAULT)
+ .build()
+ ),
+ ImmutableList.of(
Review Comment:
I was a bit suprised by this auto casting single element array to scalar, I
wonder maybe array should be moved to JSON_QUERY, and the casting should be
more an explicit requirement.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]