XuQianJin-Stars opened a new pull request, #2079:
URL: https://github.com/apache/fluss/pull/2079
<!--
*Thank you very much for contributing to Fluss - we are happy that you want
to help us improve Fluss. To help the community review your contribution in the
best possible way, please go through the checklist below, which will get the
contribution into a shape in which it can be best reviewed.*
## Contribution Checklist
- Make sure that the pull request corresponds to a [GitHub
issue](https://github.com/apache/fluss/issues). Exceptions are made for typos
in JavaDoc or documentation files, which need no issue.
- Name the pull request in the format "[component] Title of the pull
request", where *[component]* should be replaced by the name of the component
being changed. Typically, this corresponds to the component label assigned to
the issue (e.g., [kv], [log], [client], [flink]). Skip *[component]* if you are
unsure about which is the best component.
- Fill out the template below to describe the changes contributed by the
pull request. That will give reviewers the context they need to do the review.
- Make sure that the change passes the automated tests, i.e., `mvn clean
verify` passes.
- Each pull request should address only one issue, not mix up code from
multiple issues.
**(The sections below can be removed for hotfixes or typos)**
-->
### Purpose
Linked issue: close #1974
This PR introduces support for nested ROW type in ARROW, COMPACTED, and
INDEXED formats, enabling Fluss to handle complex nested data structures
including nested rows and nested arrays.
### Brief change log
**Core Changes:**
- Added `ArrowRowColumnVector` to support reading ROW type from Arrow format
- Added `ArrowRowWriter` to support writing ROW type to Arrow format
- Introduced `RowColumnVector` interface for columnar row operations
- Added `RowSerializer` for ROW type serialization/deserialization
- Enhanced `IndexedRowWriter` and `IndexedRowReader` to support nested ROW
type
- Enhanced `CompactedRowWriter` and `CompactedRowReader` to support nested
ROW type
**Type System Enhancements:**
- Extended `InternalRow` and `InternalArray` interfaces with `getRow()`
method
- Updated `DataGetters` to include ROW type getter
- Enhanced `ArrowUtils` to create Arrow column vectors for ROW type
**Connector Integration:**
- Added `FlinkRowConverter` and `FlinkArrayConverter` for Flink-Fluss type
conversion
- Added `PaimonRowAsFlussRow` and `PaimonArrayAsFlussArray` for Paimon-Fluss
type conversion
- Updated existing converters to support nested structures
**Test Coverage:**
- Enhanced `ArrowReaderWriterTest` with nested ROW and nested ARRAY test
cases
- Updated `IndexedRowTest` and `IndexedRowReaderTest` for ROW type validation
This change enables Fluss to store and process complex nested data
structures, which is essential for advanced analytics and complex data modeling
scenarios.
### Tests
This PR includes the following unit tests to verify the nested ROW and ARRAY
type support in Arrow format:
**Unit Tests:**
- `ArrowReaderWriterTest#testReaderWriter()` - Validates that Arrow reader
and writer can correctly handle nested ROW and nested ARRAY types
- Tests nested ARRAY: `ARRAY(ARRAY(STRING))`
- Tests nested ROW: `ROW(INT, ROW(INT, STRING, BIGINT), STRING)`
- Verifies null value handling in nested structures
- Validates correct serialization and deserialization of complex nested
types
- `IndexedRowReaderTest` - Verifies IndexedRow format support for ROW type
read/write operations
- `IndexedRowTest` - Validates IndexedRow handling of nested types in
various scenarios
**Test Coverage:**
- Arrow format nested ROW type read/write
- Arrow format nested ARRAY type read/write
- Compacted format ROW type support
- Indexed format ROW type support
- Type conversion integration with Flink and Paimon connectors
**Test Data:**
- Multi-level nested structures with various primitive types
- Mixed scenarios of null and non-null values
- Comprehensive validation of all basic types within nested structures
All test cases pass with `mvn clean verify`.
### API and Format
This change affects the storage format:
- Extends ARROW format to support nested ROW type structures
- Extends COMPACTED format to support ROW type serialization
- Extends INDEXED format to support ROW type serialization
- No breaking changes to existing API or storage format
- Backward compatible with existing data
### Documentation
This change introduces a new feature (nested ROW type support).
Documentation is not required as per user request.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]