sidpn opened a new pull request, #17463:
URL: https://github.com/apache/pinot/pull/17463
## Summary
This PR implements nested dictionary encoding support for Apache Arrow
format in Pinot, addressing the limitation noted in PR #17031.
## Changes
### Core Implementation
- **ArrowToGenericRowConverter.java**: Added recursive dictionary decoding
methods
- `extractValueFromVector()`: Core helper for dictionary decoding with
null handling
- `extractListValue()`: Support for `List<DictEncodedType>`
- `extractStructValue()`: Support for `Struct` with dict-encoded fields
- `extractMapValue()`: Support for `Map<DictEncodedKey/Value>`
- Modified `convertSingleRow()` to route complex types appropriately
### Test Coverage
- **ArrowTestDataUtil.java**: Added test data generators
- `createListWithDictEncodedElementsData()`
- `createStructWithDictEncodedFieldsData()`
- `createMapWithDictEncodedKeysData()`
- **ArrowMessageDecoderTest.java**: Added unit tests
- `testListWithDictEncodedElements()`
- `testStructWithDictEncodedFields()`
- `testMapWithDictEncodedKeys()`
## Features
- ✅ Recursive dictionary decoding for nested structures
- ✅ Null value preservation throughout decoding
- ✅ Support for arbitrary nesting depth
- ✅ Backwards compatible with existing code
- ✅ Error handling with contextual logging
## Testing
**Patterns Tested:**
- List<DictString>
- Struct<name: DictString, age: int>
- Map<DictString, int>
**Backwards Compatibility:**
- All existing tests should pass (existing flat dictionary encoding and
non-dictionary nested structures unchanged)
## Known Limitations
**Not Yet Tested (Future Work):**
- Deep nesting (3-4 levels)
- Complex nested combinations (List<Struct<field: DictString>>)
- Error edge cases (invalid dict ID, out-of-bounds index)
## Test Plan
1. Run existing Arrow decoder tests to verify no regressions
2. Run new nested dictionary tests to verify functionality
3. Test with real Arrow data containing nested dictionary-encoded
structures
## Related
- Addresses limitation from PR #17031
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]