kgyrtkirk commented on code in PR #15434:
URL: https://github.com/apache/druid/pull/15434#discussion_r1407192800
##########
docs/querying/sql-functions.md:
##########
@@ -50,7 +50,7 @@ Calculates the arc cosine of a numeric expression.
## ANY_VALUE
-`ANY_VALUE(expr, [maxBytesPerValue])`
+`ANY_VALUE(expr, [maxBytesPerValue, aggregateMultipleValues])`
Review Comment:
I believe `ANY_VALUE(expr, [maxBytesPerValue, aggregateMultipleValues])`
means that the 2 way this function can be called is:
`ANY_VALUE(expr)` and `ANY_VALUE(expr, maxBytesPerValue,
aggregateMultipleValues)`
shouldn't this be `ANY_VALUE(expr, [maxBytesPerValue,
[aggregateMultipleValues]])` ?
##########
processing/src/main/java/org/apache/druid/query/aggregation/any/StringAnyVectorAggregator.java:
##########
@@ -78,7 +84,22 @@ public void aggregate(ByteBuffer buf, int position, int
startRow, int endRow)
if (startRow < rows.length) {
IndexedInts row = rows[startRow];
@Nullable
- String foundValue = row.size() == 0 ? null :
multiValueSelector.lookupName(row.get(0));
+ String foundValue;
+ if (row.size() == 0) {
+ foundValue = null;
+ } else if (aggregateMultipleValues) {
+ if (row.size() > 1) {
+ List<String> arrayList = new ArrayList<>();
+ row.forEach(rowIndex -> {
+ arrayList.add(multiValueSelector.lookupName(rowIndex));
+ });
+ foundValue =
DimensionHandlerUtils.convertObjectToString(arrayList);
+ } else {
+ foundValue = multiValueSelector.lookupName(row.get(0));
+ }
+ } else {
+ foundValue = multiValueSelector.lookupName(row.get(0));
+ }
Review Comment:
note: I beleive this could be moved into a method; so that every
`foundValue` assignment could become a `return`
##########
sql/src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/EarliestLatestAnySqlAggregator.java:
##########
@@ -244,17 +249,35 @@ public Aggregation toDruidAggregation(
final AggregatorFactory theAggFactory;
switch (args.size()) {
case 1:
- theAggFactory = aggregatorType.createAggregatorFactory(aggregatorName,
fieldName, null, outputType, null);
+ theAggFactory = aggregatorType.createAggregatorFactory(aggregatorName,
fieldName, null, outputType, null, true);
break;
case 2:
- int maxStringBytes;
+ Integer maxStringBytes = getMaxStringBytes(rexNodes.get(1),
aggregateCall, plannerContext);
+ if (maxStringBytes == null) {
+ return null;
+ }
+ theAggFactory = aggregatorType.createAggregatorFactory(
+ aggregatorName,
+ fieldName,
+ null,
+ outputType,
+ maxStringBytes.intValue(),
+ true
+ );
+ break;
+ case 3:
+ maxStringBytes = getMaxStringBytes(rexNodes.get(1), aggregateCall,
plannerContext);
+ if (maxStringBytes == null) {
+ return null;
+ }
+ boolean aggregateMultipleValues;
try {
- maxStringBytes = RexLiteral.intValue(rexNodes.get(1));
+ aggregateMultipleValues =
RexLiteral.booleanValue(Preconditions.checkNotNull(rexNodes.get(2)));
}
- catch (AssertionError ae) {
+ catch (Exception ae) {
plannerContext.setPlanningError(
- "The second argument '%s' to function '%s' is not a number",
- rexNodes.get(1),
+ "The third argument '%s' to function '%s' is not a boolean",
+ rexNodes.get(2),
aggregateCall.getName()
);
Review Comment:
this is a bit too late... a check like this should happen during validation
and not during aggregator production; a straight `DruidException` might also be
better.
##########
sql/src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/EarliestLatestAnySqlAggregator.java:
##########
@@ -244,17 +249,35 @@ public Aggregation toDruidAggregation(
final AggregatorFactory theAggFactory;
switch (args.size()) {
case 1:
- theAggFactory = aggregatorType.createAggregatorFactory(aggregatorName,
fieldName, null, outputType, null);
+ theAggFactory = aggregatorType.createAggregatorFactory(aggregatorName,
fieldName, null, outputType, null, true);
break;
case 2:
- int maxStringBytes;
+ Integer maxStringBytes = getMaxStringBytes(rexNodes.get(1),
aggregateCall, plannerContext);
+ if (maxStringBytes == null) {
+ return null;
+ }
+ theAggFactory = aggregatorType.createAggregatorFactory(
+ aggregatorName,
+ fieldName,
+ null,
+ outputType,
+ maxStringBytes.intValue(),
+ true
+ );
+ break;
+ case 3:
+ maxStringBytes = getMaxStringBytes(rexNodes.get(1), aggregateCall,
plannerContext);
Review Comment:
nit: copy-paste - as the `createAggregatorFactory` will be called anyway
with a `null` if its not specified; wouldn't it be better to just keep a
variable and set its value when available?
##########
processing/src/main/java/org/apache/druid/query/aggregation/any/StringAnyBufferAggregator.java:
##########
@@ -52,7 +55,13 @@ public void aggregate(ByteBuffer buf, int position)
{
if (buf.getInt(position) == NOT_FOUND_FLAG_VALUE) {
final Object object = valueSelector.getObject();
- String foundValue = DimensionHandlerUtils.convertObjectToString(object);
+ String foundValue;
+ if (object != null && object instanceof List &&
!aggregateMultipleValues) {
+ List<Object> objectList = (List) object;
+ foundValue = objectList.size() > 0 ?
DimensionHandlerUtils.convertObjectToString(objectList.get(0)) : null;
Review Comment:
shouldn't this go over all elements of the list? the vector implementation
seems to be doing that
##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java:
##########
@@ -13433,6 +13433,45 @@ public void
testStringAggExpressionNonConstantSeparator()
);
}
+ @Test
+ public void testStringAnyAggArgValidation()
Review Comment:
there seems to be multiple testcases in this test; can you separate them?
##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java:
##########
@@ -13433,6 +13433,45 @@ public void
testStringAggExpressionNonConstantSeparator()
);
}
+ @Test
+ public void testStringAnyAggArgValidation()
+ {
+ DruidException e = assertThrows(DruidException.class, () -> testBuilder()
+ .sql("SELECT ANY_VALUE(dim3, 1000, 'true') FROM foo")
+ .queryContext(ImmutableMap.of())
+ .run());
+ assertThat(
+ e,
+ invalidSqlIs(
+ "Cannot apply 'ANY_VALUE' to arguments of type
'ANY_VALUE(<VARCHAR>, <INTEGER>, <CHAR(4)>)'. Supported form(s):
'ANY_VALUE(<expr>, [<maxBytesPerString>, [<aggregateMultipleValues>]])' (line
[1], column [8])")
+ );
+ QueryTestBuilder qtb1 = testBuilder()
+ .sql("SELECT ANY_VALUE(dim3, 1000, null) FROM foo")
+ .queryContext(ImmutableMap.of());
+
+ DruidException e1 = assertThrows(DruidException.class, () -> qtb1.run());
Review Comment:
from `Query could not be planned. A possible reason is`
it seem like it went fishing - but the error is `[The third argument
'null:NULL' to function 'EXPR$0' is not a boolean]` which should be identified
much earlier.
I believe the best way to fix this inconsistency would be to make sure this
is checked during validation.
Another way around is to use `@NotYetSupported` and leave it for later....
##########
processing/src/main/java/org/apache/druid/query/aggregation/any/StringAnyVectorAggregator.java:
##########
@@ -78,7 +84,22 @@ public void aggregate(ByteBuffer buf, int position, int
startRow, int endRow)
if (startRow < rows.length) {
IndexedInts row = rows[startRow];
@Nullable
- String foundValue = row.size() == 0 ? null :
multiValueSelector.lookupName(row.get(0));
+ String foundValue;
+ if (row.size() == 0) {
+ foundValue = null;
+ } else if (aggregateMultipleValues) {
+ if (row.size() > 1) {
+ List<String> arrayList = new ArrayList<>();
+ row.forEach(rowIndex -> {
+ arrayList.add(multiValueSelector.lookupName(rowIndex));
+ });
+ foundValue =
DimensionHandlerUtils.convertObjectToString(arrayList);
+ } else {
+ foundValue = multiValueSelector.lookupName(row.get(0));
Review Comment:
the other impelementation called
`DimensionHandlerUtils.convertObjectToString` even on a single value - is that
ok; shouldn't they be in sync?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]