clintropolis opened a new pull request, #13406:
URL: https://github.com/apache/druid/pull/13406

   ### Description
   This PR fixes an issue when using `KafkaInputFormat` with Druid nested 
columns, where any nested data was effectively unable to be ingested when using 
this format, _unless_ the nested columns were added explicitly to the 
`flattenSpec` of the underlying format. The reason for this is because Druid 
nested column indexer and nested data transformation functions such as 
`json_value` rely on the `flattenSpec` machinery to extract and convert data 
from various nested formats into plain java objects.
   
   The `KafkaInputFormat` was eagerly copying the value payload `Map` (which 
was a flattener) and blending it with the 'header' `Map` to make a composite 
input row, however currently nested columns do not advertise on flattener 
`keySet`, so when this copy happened the nested data was left out, leading to 
always seeing `null` valued inputs when using Druid nested indexer or 
transforms.
   
   This PR solves the issue by building a `Map` which delegates to the payload 
map before falling back to the header map, allowing the underlying flattener 
from the payload to keep doing its thing.
   
   The added test cases all fail prior to the changes in this patch with errors 
of the form:
   ```
   java.lang.AssertionError: 
   Expected :{mg=1}
   Actual   :null
   ```
   
   <hr>
   
   This PR has:
   
   - [x] been self-reviewed.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [x] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [x] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to