clintropolis commented on code in PR #15434:
URL: https://github.com/apache/druid/pull/15434#discussion_r1408885801
##########
processing/src/main/java/org/apache/druid/query/aggregation/any/StringAnyBufferAggregator.java:
##########
@@ -65,6 +67,27 @@ public void aggregate(ByteBuffer buf, int position)
}
}
+ private String readValue(Object object)
+ {
+ if (object == null) {
+ return null;
+ }
+ if (object instanceof List) {
+ List<Object> objectList = (List) object;
+ if (objectList.size() == 0) {
+ return null;
+ }
+ if (objectList.size() == 1) {
+ return DimensionHandlerUtils.convertObjectToString(objectList.get(0));
+ }
+ if (aggregateMultipleValues) {
+ return DimensionHandlerUtils.convertObjectToString(objectList);
+ }
+ return DimensionHandlerUtils.convertObjectToString(objectList.get(0));
Review Comment:
now that I think about it, for the non-vectorized engine, string object
selectors actually already unwrap the list if it is only a single element, so I
don't think we might not actually need special handling for list size 1
https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/DimensionSelector.java#L141
which is called by
https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/DimensionSelector.java#L128
which is what is used by the multi-value dimension selector when reading from
segments
https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/column/StringUtf8DictionaryEncodedColumn.java#L229
(no other dimension selectors for mvds override this behavior)
sorry for the confusion, the value will always be a `String` or `null` in
the case of size 1 or size 0 when using getObject().
##########
docs/querying/sql-aggregations.md:
##########
@@ -90,7 +90,7 @@ In the aggregation functions supported by Druid, only
`COUNT`, `ARRAY_AGG`, and
|`EARLIEST_BY(expr, timestampExpr, [maxBytesPerValue])`|Returns the earliest
value of `expr`.<br />The earliest value of `expr` is taken from the row with
the overall earliest non-null value of `timestampExpr`. <br />If the earliest
non-null value of `timestampExpr` appears in multiple rows, the `expr` may be
taken from any of those rows.<br /><br />If `expr` is a string or complex type
`maxBytesPerValue` amount of space is allocated for the aggregation. Strings
longer than this limit are truncated. The `maxBytesPerValue` parameter should
be set as low as possible, since high values will lead to wasted memory.<br/>If
`maxBytesPerValue`is omitted; it defaults to `1024`. |`null` or `0`/`''` if
`druid.generic.useDefaultValueForNull=true` (legacy mode)|
|`LATEST(expr, [maxBytesPerValue])`|Returns the latest value of `expr`<br
/>The `expr` must come from a relation with a timestamp column (like `__time`
in a Druid datasource) and the "latest" is taken from the row with the overall
latest non-null value of the timestamp column.<br />If the latest non-null
value of the timestamp column appears in multiple rows, the `expr` may be taken
from any of those rows.<br /><br />If `expr` is a string or complex type
`maxBytesPerValue` amount of space is allocated for the aggregation. Strings
longer than this limit are truncated. The `maxBytesPerValue` parameter should
be set as low as possible, since high values will lead to wasted memory.<br/>If
`maxBytesPerValue`is omitted; it defaults to `1024`. |`null` or `0`/`''` if
`druid.generic.useDefaultValueForNull=true` (legacy mode)|
|`LATEST_BY(expr, timestampExpr, [maxBytesPerValue])`|Returns the latest value
of `expr`.<br />The latest value of `expr` is taken from the row with the
overall latest non-null value of `timestampExpr`.<br />If the overall latest
non-null value of `timestampExpr` appears in multiple rows, the `expr` may be
taken from any of those rows.<br /><br />If `expr` is a string or complex type
`maxBytesPerValue` amount of space is allocated for the aggregation. Strings
longer than this limit are truncated. The `maxBytesPerValue` parameter should
be set as low as possible, since high values will lead to wasted memory.<br/>If
`maxBytesPerValue`is omitted; it defaults to `1024`. |`null` or `0`/`''` if
`druid.generic.useDefaultValueForNull=true` (legacy mode)|
-|`ANY_VALUE(expr, [maxBytesPerValue])`|Returns any value of `expr` including
null. This aggregator can simplify and optimize the performance by returning
the first encountered value (including `null`).<br /><br />If `expr` is a
string or complex type `maxBytesPerValue` amount of space is allocated for the
aggregation. Strings longer than this limit are truncated. The
`maxBytesPerValue` parameter should be set as low as possible, since high
values will lead to wasted memory.<br/>If `maxBytesPerValue`is omitted; it
defaults to `1024`. |`null` or `0`/`''` if
`druid.generic.useDefaultValueForNull=true` (legacy mode)|
+|`ANY_VALUE(expr, [maxBytesPerValue, [aggregateMultipleValues]])`|Returns any
value of `expr` including null. This aggregator can simplify and optimize the
performance by returning the first encountered value (including `null`).<br
/><br />If `expr` is a string or complex type `maxBytesPerValue` amount of
space is allocated for the aggregation. Strings longer than this limit are
truncated. The `maxBytesPerValue` parameter should be set as low as possible,
since high values will lead to wasted memory.<br/>If `maxBytesPerValue`is
omitted; it defaults to `1024`. `aggregateMultipleValues` is optional boolean
flag defines behavior of aggregating multiple values in field.
`aggregateMultipleValues` is set as true by default and returns the array in
case of Multiple values (MVD) in field. By setting it to false, function will
return first encountered value in case of MVDs. |`null` or `0`/`''` if
`druid.generic.useDefaultValueForNull=true` (legacy mode)|
Review Comment:
we should consider linking to
https://github.com/apache/druid/blob/master/docs/querying/multi-value-dimensions.md
here i think
```suggestion
|`ANY_VALUE(expr, [maxBytesPerValue, [aggregateMultipleValues]])`|Returns
any value of `expr` including null. This aggregator can simplify and optimize
the performance by returning the first encountered value (including `null`).<br
/><br />If `expr` is a string or complex type `maxBytesPerValue` amount of
space is allocated for the aggregation. Strings longer than this limit are
truncated. The `maxBytesPerValue` parameter should be set as low as possible,
since high values will lead to wasted memory.<br/>If `maxBytesPerValue` is
omitted; it defaults to `1024`. `aggregateMultipleValues` is an optional
boolean flag controls the behavior of aggregating a [multi-value
dimension](./multi-value-dimensions.md). `aggregateMultipleValues` is set as
true by default and returns the stringified array in case of a multi-value
dimension. By setting it to false, function will return first value instead.
|`null` or `0`/`''` if `druid.generic.useDefaultValueForNull=true` (legacy
mode)|
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]