clintropolis commented on code in PR #12873:
URL: https://github.com/apache/druid/pull/12873#discussion_r940598613


##########
core/src/main/java/org/apache/druid/java/util/common/parsers/JSONFlattenerMaker.java:
##########
@@ -143,11 +143,19 @@ private Object convertJsonNode(JsonNode val)
       return charsetFix(val.asText());
     }
 
+    if (val.isBoolean()) {
+      return val.asBoolean();
+    }
+
+    if (val.isBinary() && val instanceof BinaryNode) {
+      return ((BinaryNode) val).binaryValue();

Review Comment:
   >Will this ever get generated from raw JSON? Does Jackson have some code 
that detects stuff that looks like Base64 and then decodes it?
   
   I don't think it ever happens by default, at least didn't in my tests with 
the default object mapper I used in tests which read them into `TextNode`. I 
wasn't sure if it was possible to happen somehow though via configuration or 
custom deserializers, so I added the case for it in the event it can happen 
some way.
   
   > If so: that seems weird to me: IMO we should return the Base64 string, not 
the decoded byte[] here. If people want to decode it then we should provide a 
base64_decode function.
   
   I thought about this, but we do have a few places that can handle `byte[]` 
directly, this one since they can come from other input formats 
https://github.com/apache/druid/blob/master/core/src/main/java/org/apache/druid/data/input/Rows.java#L70
   and the other I was thinking of is
   
https://github.com/apache/druid/blob/master/core/src/main/java/org/apache/druid/math/expr/ExprEval.java#L471
 which when creating a complex typed `ExprEval` will try to base64 decode 
`String` into `byte[]` to deserialize into a complex object. Avro, Parquet, and 
ORC have a `binaryAsString` option that will potentially convert binary stuff 
as UTF8 Strings, but that is slightly different behavior than what we are 
discussing here.
   
   I don't feel especially strongly, especially since the complex expression 
stuff can handle both `byte[]` and `String`, but it also doesn't really seem 
harmful because if it ends up in an input row as a string column it will be 
base64 serialized.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to