sidpn opened a new pull request, #17463:
URL: https://github.com/apache/pinot/pull/17463

   ## Summary
   
     This PR implements nested dictionary encoding support for Apache Arrow 
format in Pinot, addressing the limitation noted in PR #17031.
   
     ## Changes
   
     ### Core Implementation
     - **ArrowToGenericRowConverter.java**: Added recursive dictionary decoding 
methods
       - `extractValueFromVector()`: Core helper for dictionary decoding with 
null handling
       - `extractListValue()`: Support for `List<DictEncodedType>`
       - `extractStructValue()`: Support for `Struct` with dict-encoded fields
       - `extractMapValue()`: Support for `Map<DictEncodedKey/Value>`
       - Modified `convertSingleRow()` to route complex types appropriately
   
     ### Test Coverage
     - **ArrowTestDataUtil.java**: Added test data generators
       - `createListWithDictEncodedElementsData()`
       - `createStructWithDictEncodedFieldsData()`
       - `createMapWithDictEncodedKeysData()`
   
     - **ArrowMessageDecoderTest.java**: Added unit tests
       - `testListWithDictEncodedElements()`
       - `testStructWithDictEncodedFields()`
       - `testMapWithDictEncodedKeys()`
   
     ## Features
   
     - ✅ Recursive dictionary decoding for nested structures
     - ✅ Null value preservation throughout decoding
     - ✅ Support for arbitrary nesting depth
     - ✅ Backwards compatible with existing code
     - ✅ Error handling with contextual logging
   
     ## Testing
   
     **Patterns Tested:**
     - List<DictString>
     - Struct<name: DictString, age: int>
     - Map<DictString, int>
   
     **Backwards Compatibility:**
     - All existing tests should pass (existing flat dictionary encoding and 
non-dictionary nested structures unchanged)
   
     ## Known Limitations
   
     **Not Yet Tested (Future Work):**
     - Deep nesting (3-4 levels)
     - Complex nested combinations (List<Struct<field: DictString>>)
     - Error edge cases (invalid dict ID, out-of-bounds index)
   
     ## Test Plan
   
     1. Run existing Arrow decoder tests to verify no regressions
     2. Run new nested dictionary tests to verify functionality
     3. Test with real Arrow data containing nested dictionary-encoded 
structures
   
     ## Related
   
     - Addresses limitation from PR #17031


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to