clintropolis commented on code in PR #12873:
URL: https://github.com/apache/druid/pull/12873#discussion_r940598613
##########
core/src/main/java/org/apache/druid/java/util/common/parsers/JSONFlattenerMaker.java:
##########
@@ -143,11 +143,19 @@ private Object convertJsonNode(JsonNode val)
return charsetFix(val.asText());
}
+ if (val.isBoolean()) {
+ return val.asBoolean();
+ }
+
+ if (val.isBinary() && val instanceof BinaryNode) {
+ return ((BinaryNode) val).binaryValue();
Review Comment:
>Will this ever get generated from raw JSON? Does Jackson have some code
that detects stuff that looks like Base64 and then decodes it?
I don't think it ever happens by default, at least didn't in my tests with
the default object mapper I used in tests which read them into `TextNode`. I
wasn't sure if it was possible to happen somehow though via configuration or
custom deserializers, so I added the case for it in the event it can happen
some way.
> If so: that seems weird to me: IMO we should return the Base64 string, not
the decoded byte[] here. If people want to decode it then we should provide a
base64_decode function.
I thought about this, but we do have a few places that can handle `byte[]`
directly, this one since they can come from other input formats
https://github.com/apache/druid/blob/master/core/src/main/java/org/apache/druid/data/input/Rows.java#L70
and the other I was thinking of is
https://github.com/apache/druid/blob/master/core/src/main/java/org/apache/druid/math/expr/ExprEval.java#L471
which when creating a complex typed `ExprEval` will try to base64 decode
`String` into `byte[]` to deserialize into a complex object. Avro, Parquet, and
ORC have a `binaryAsString` option that will potentially convert binary stuff
as UTF8 Strings, but that is slightly different behavior than what we are
discussing here.
I don't feel especially strongly, especially since the complex expression
stuff can handle both `byte[]` and `String`, but it also doesn't really seem
harmful because if it ends up in an input row as a string column it will be
base64 serialized.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]