clintropolis commented on code in PR #13947:
URL: https://github.com/apache/druid/pull/13947#discussion_r1142991296
##########
processing/src/main/java/org/apache/druid/math/expr/ExprEval.java:
##########
@@ -515,6 +515,9 @@ public static ExprEval ofType(@Nullable ExpressionType
type, @Nullable Object va
if (value instanceof List) {
return bestEffortOf(value);
}
+ if (value instanceof byte[]) {
+ return new StringExprEval(StringUtils.encodeBase64String((byte[])
value));
+ }
Review Comment:
This block is for things that are asking for `STRING` typed values, (though
they might be multi-value strings as well). `COMPLEX` types will accept bytes
as is and try to deserialize them into the appropriate object using a
`TypeStrategy` that wraps the `ObjectStrategy`.
However, to complicate this answer slightly, where the type passed to this
method comes from varies depending on where it is being called. This `ofType`
method is what backs `IdentifierExpr` which is what feeds input values into
expressions. When backed by a segment, the type will be the type which was
stored in the segment, etc. For places we don't know though, such as expression
transforms, we fall back to using `bestEffortOf`, which will handle `byte[]` as
a `STRING` type. It lacks the complex type name so cant handle byte[] to
complex object translation, so we turn it into a string because at least then
something could do something with it.
Responding to this made me realize that the expression post-agg bindings
could be improved with the partial type information derived from the
aggregators 'result' type, so I have updated them to use it accordingly in the
latest commit.
Back to `byte[]`, we could alternatively consider leaving it as 'unknown'
`COMPLEX`, though that would cause some issue with nested columns which is the
other main user of 'bestEffortOf', which uses it to try to derive the type
information of these values. Since we don't have a native binary blob type,
`STRING` is most useful here so we can at least preserve the values (and for
JSON, `byte[]` already come in as base64 strings, so byte[] really only appear
in other nested formats, such as Avro, Parquet, Protobuf, and ORC).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]