clintropolis opened a new pull request, #13407:
URL: https://github.com/apache/druid/pull/13407

   ### Description
   This PR fixes a regression caused by #13375 where null ORC inputs would be 
processed into `{}` instead of `null` as expected. 
   
   The cause of the regression was allowing the nested types to be returned 
during conversion to support nested ingestion, which exposed another underlying 
oddity of why the values were ending up as empty maps instead of `null`.
   
   The ORC json provider `isMap` method looks like this
   ```
     @Override
     public boolean isMap(final Object o)
     {
       return o == null || o instanceof Map || o instanceof OrcStruct;
     }
   ```
   which is a bit strange, however is consistent with the other implementations 
of other nested formats. This means `toPlainJavaObject` will treat `null` as a 
map for most types, resulting in the empty map when converting to java objects. 
I haven't quite discovered why these are implemented like this (if it was me, i 
cannot remember 😅), but to avoid changing the behavior here, `toMap` now checks 
for `null` response from `toPlainJavaObject` and returns an empty map if so, so 
that `toPlainJavaObject` will not translate `null` into an empty map.
   
   While writing a test for this I noticed that `toPlainJavaObject` could still 
leak format specific types since the fall through value was not 'finalized' 
like the values inside of maps and lists are, so the json `NullNode` for 
example by processing a `null` input row would cause the sampler to explode. 
I'm unsure how common this example is, but it seems safer to finalize the 
values the fall through just to be safe.
   
   <hr>
   
   
   This PR has:
   
   - [x] been self-reviewed.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [x] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to