clintropolis commented on code in PR #15434:
URL: https://github.com/apache/druid/pull/15434#discussion_r1408885801


##########
processing/src/main/java/org/apache/druid/query/aggregation/any/StringAnyBufferAggregator.java:
##########
@@ -65,6 +67,27 @@ public void aggregate(ByteBuffer buf, int position)
     }
   }
 
+  private String readValue(Object object)
+  {
+    if (object == null) {
+      return null;
+    }
+    if (object instanceof List) {
+      List<Object> objectList = (List) object;
+      if (objectList.size() == 0) {
+        return null;
+      }
+      if (objectList.size() == 1) {
+        return DimensionHandlerUtils.convertObjectToString(objectList.get(0));
+      }
+      if (aggregateMultipleValues) {
+        return DimensionHandlerUtils.convertObjectToString(objectList);
+      }
+      return DimensionHandlerUtils.convertObjectToString(objectList.get(0));

Review Comment:
   now that I think about it, for the non-vectorized engine, string object 
selectors actually already unwrap the list if it is only a single element, so I 
don't think we might not actually need special handling for list size 1
   
https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/DimensionSelector.java#L141
 which is called by 
https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/DimensionSelector.java#L128
 which is what is used by the multi-value dimension selector when reading from 
segments 
https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/column/StringUtf8DictionaryEncodedColumn.java#L229
 (no other dimension selectors for mvds override this behavior)
   
   sorry for the confusion, the value will always be a `String` or `null` in 
the case of size 1 or size 0 when using getObject(). 



##########
docs/querying/sql-aggregations.md:
##########
@@ -90,7 +90,7 @@ In the aggregation functions supported by Druid, only 
`COUNT`, `ARRAY_AGG`, and
 |`EARLIEST_BY(expr, timestampExpr, [maxBytesPerValue])`|Returns the earliest 
value of `expr`.<br />The earliest value of `expr` is taken from the row with 
the overall earliest non-null value of `timestampExpr`. <br />If the earliest 
non-null value of `timestampExpr` appears in multiple rows, the `expr` may be 
taken from any of those rows.<br /><br />If `expr` is a string or complex type 
`maxBytesPerValue` amount of space is allocated for the aggregation. Strings 
longer than this limit are truncated. The  `maxBytesPerValue` parameter should 
be set as low as possible, since high values will lead to wasted memory.<br/>If 
`maxBytesPerValue`is omitted; it defaults to `1024`. |`null` or `0`/`''` if 
`druid.generic.useDefaultValueForNull=true` (legacy mode)|
 |`LATEST(expr, [maxBytesPerValue])`|Returns the latest value of `expr`<br 
/>The `expr` must come from a relation with a timestamp column (like `__time` 
in a Druid datasource) and the "latest" is taken from the row with the overall 
latest non-null value of the timestamp column.<br />If the latest non-null 
value of the timestamp column appears in multiple rows, the `expr` may be taken 
from any of those rows.<br /><br />If `expr` is a string or complex type 
`maxBytesPerValue` amount of space is allocated for the aggregation. Strings 
longer than this limit are truncated. The  `maxBytesPerValue` parameter should 
be set as low as possible, since high values will lead to wasted memory.<br/>If 
`maxBytesPerValue`is omitted; it defaults to `1024`. |`null` or `0`/`''` if 
`druid.generic.useDefaultValueForNull=true` (legacy mode)|
 |`LATEST_BY(expr, timestampExpr, [maxBytesPerValue])`|Returns the latest value 
of `expr`.<br />The latest value of `expr` is taken from the row with the 
overall latest non-null value of `timestampExpr`.<br />If the overall latest 
non-null value of `timestampExpr` appears in multiple rows, the `expr` may be 
taken from any of those rows.<br /><br />If `expr` is a string or complex type 
`maxBytesPerValue` amount of space is allocated for the aggregation. Strings 
longer than this limit are truncated. The `maxBytesPerValue` parameter should 
be set as low as possible, since high values will lead to wasted memory.<br/>If 
`maxBytesPerValue`is omitted; it defaults to `1024`. |`null` or `0`/`''` if 
`druid.generic.useDefaultValueForNull=true` (legacy mode)|
-|`ANY_VALUE(expr, [maxBytesPerValue])`|Returns any value of `expr` including 
null. This aggregator can simplify and optimize the performance by returning 
the first encountered value (including `null`).<br /><br />If `expr` is a 
string or complex type `maxBytesPerValue` amount of space is allocated for the 
aggregation. Strings longer than this limit are truncated. The 
`maxBytesPerValue` parameter should be set as low as possible, since high 
values will lead to wasted memory.<br/>If `maxBytesPerValue`is omitted; it 
defaults to `1024`. |`null` or `0`/`''` if 
`druid.generic.useDefaultValueForNull=true` (legacy mode)|
+|`ANY_VALUE(expr, [maxBytesPerValue, [aggregateMultipleValues]])`|Returns any 
value of `expr` including null. This aggregator can simplify and optimize the 
performance by returning the first encountered value (including `null`).<br 
/><br />If `expr` is a string or complex type `maxBytesPerValue` amount of 
space is allocated for the aggregation. Strings longer than this limit are 
truncated. The `maxBytesPerValue` parameter should be set as low as possible, 
since high values will lead to wasted memory.<br/>If `maxBytesPerValue`is 
omitted; it defaults to `1024`. `aggregateMultipleValues` is optional boolean 
flag defines behavior of aggregating multiple values in field. 
`aggregateMultipleValues` is set as true by default and returns the array in 
case of Multiple values (MVD) in field. By setting it to false, function will 
return first encountered value in case of MVDs. |`null` or `0`/`''` if 
`druid.generic.useDefaultValueForNull=true` (legacy mode)|

Review Comment:
   we should consider linking to 
https://github.com/apache/druid/blob/master/docs/querying/multi-value-dimensions.md
 here i think
   ```suggestion
   |`ANY_VALUE(expr, [maxBytesPerValue, [aggregateMultipleValues]])`|Returns 
any value of `expr` including null. This aggregator can simplify and optimize 
the performance by returning the first encountered value (including `null`).<br 
/><br />If `expr` is a string or complex type `maxBytesPerValue` amount of 
space is allocated for the aggregation. Strings longer than this limit are 
truncated. The `maxBytesPerValue` parameter should be set as low as possible, 
since high values will lead to wasted memory.<br/>If `maxBytesPerValue` is 
omitted; it defaults to `1024`. `aggregateMultipleValues` is an optional 
boolean flag controls the behavior of aggregating a [multi-value 
dimension](./multi-value-dimensions.md). `aggregateMultipleValues` is set as 
true by default and returns the stringified array in case of a multi-value 
dimension. By setting it to false, function will return first value instead. 
|`null` or `0`/`''` if `druid.generic.useDefaultValueForNull=true` (legacy 
mode)|
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to