[PR] [common] Introduce ROW type for ARROW, COMPACTED and INDEXED formats [fluss]

via GitHub Tue, 02 Dec 2025 03:28:46 -0800


XuQianJin-Stars opened a new pull request, #2079:
URL: https://github.com/apache/fluss/pull/2079


   <!--
   *Thank you very much for contributing to Fluss - we are happy that you want 
to help us improve Fluss. To help the community review your contribution in the 
best possible way, please go through the checklist below, which will get the 
contribution into a shape in which it can be best reviewed.*
   
   ## Contribution Checklist
   
     - Make sure that the pull request corresponds to a [GitHub 
issue](https://github.com/apache/fluss/issues). Exceptions are made for typos 
in JavaDoc or documentation files, which need no issue.
   
     - Name the pull request in the format "[component] Title of the pull 
request", where *[component]* should be replaced by the name of the component 
being changed. Typically, this corresponds to the component label assigned to 
the issue (e.g., [kv], [log], [client], [flink]). Skip *[component]* if you are 
unsure about which is the best component.
   
     - Fill out the template below to describe the changes contributed by the 
pull request. That will give reviewers the context they need to do the review.
   
     - Make sure that the change passes the automated tests, i.e., `mvn clean 
verify` passes.
   
     - Each pull request should address only one issue, not mix up code from 
multiple issues.
   
   
   **(The sections below can be removed for hotfixes or typos)**
   -->
   
   ### Purpose
   
   Linked issue: close #1974
   
   This PR introduces support for nested ROW type in ARROW, COMPACTED, and 
INDEXED formats, enabling Fluss to handle complex nested data structures 
including nested rows and nested arrays.
   
   ### Brief change log
   
   **Core Changes:**
   - Added `ArrowRowColumnVector` to support reading ROW type from Arrow format
   - Added `ArrowRowWriter` to support writing ROW type to Arrow format
   - Introduced `RowColumnVector` interface for columnar row operations
   - Added `RowSerializer` for ROW type serialization/deserialization
   - Enhanced `IndexedRowWriter` and `IndexedRowReader` to support nested ROW 
type
   - Enhanced `CompactedRowWriter` and `CompactedRowReader` to support nested 
ROW type
   
   **Type System Enhancements:**
   - Extended `InternalRow` and `InternalArray` interfaces with `getRow()` 
method
   - Updated `DataGetters` to include ROW type getter
   - Enhanced `ArrowUtils` to create Arrow column vectors for ROW type
   
   **Connector Integration:**
   - Added `FlinkRowConverter` and `FlinkArrayConverter` for Flink-Fluss type 
conversion
   - Added `PaimonRowAsFlussRow` and `PaimonArrayAsFlussArray` for Paimon-Fluss 
type conversion
   - Updated existing converters to support nested structures
   
   **Test Coverage:**
   - Enhanced `ArrowReaderWriterTest` with nested ROW and nested ARRAY test 
cases
   - Updated `IndexedRowTest` and `IndexedRowReaderTest` for ROW type validation
   
   This change enables Fluss to store and process complex nested data 
structures, which is essential for advanced analytics and complex data modeling 
scenarios.
   
   ### Tests
   
   This PR includes the following unit tests to verify the nested ROW and ARRAY 
type support in Arrow format:
   
   **Unit Tests:**
   - `ArrowReaderWriterTest#testReaderWriter()` - Validates that Arrow reader 
and writer can correctly handle nested ROW and nested ARRAY types
     - Tests nested ARRAY: `ARRAY(ARRAY(STRING))`
     - Tests nested ROW: `ROW(INT, ROW(INT, STRING, BIGINT), STRING)`
     - Verifies null value handling in nested structures
     - Validates correct serialization and deserialization of complex nested 
types
   
   - `IndexedRowReaderTest` - Verifies IndexedRow format support for ROW type 
read/write operations
   
   - `IndexedRowTest` - Validates IndexedRow handling of nested types in 
various scenarios
   
   **Test Coverage:**
   - Arrow format nested ROW type read/write
   - Arrow format nested ARRAY type read/write
   - Compacted format ROW type support
   - Indexed format ROW type support
   - Type conversion integration with Flink and Paimon connectors
   
   **Test Data:**
   - Multi-level nested structures with various primitive types
   - Mixed scenarios of null and non-null values
   - Comprehensive validation of all basic types within nested structures
   
   All test cases pass with `mvn clean verify`.
   
   ### API and Format
   
   This change affects the storage format:
   - Extends ARROW format to support nested ROW type structures
   - Extends COMPACTED format to support ROW type serialization
   - Extends INDEXED format to support ROW type serialization
   - No breaking changes to existing API or storage format
   - Backward compatible with existing data
   
   ### Documentation
   
   This change introduces a new feature (nested ROW type support). 
Documentation is not required as per user request.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [common] Introduce ROW type for ARROW, COMPACTED and INDEXED formats [fluss]

Reply via email to