JingsongLi opened a new pull request, #52:
URL: https://github.com/apache/paimon-mosaic/pull/52

   ## Summary
   - Add ARRAY (List) type support with **flattened columnar storage** — each 
ARRAY column is decomposed into a lengths column (INT32) and a values 
sub-bucket (element type), both independently benefiting from DICT/CONST/PLAIN 
encoding
   - Nested arrays (`ARRAY<ARRAY<INT>>`) supported via recursive decomposition 
— all leaf element values share a single dictionary
   - Type byte 18 for ARRAY with recursive schema serialization/deserialization
   - Full test coverage across Rust (7 tests), Java (1), Python (4), C++ (3)
   - Documentation updates: design spec (storage format, encoding), 
Java/Python/C++ API examples
   
   ## Storage Design
   ```
   ARRAY<INT> column (N rows, M total elements):
   
   Main bucket:
     └── lengths column (INT32, N entries)     ← DICT/CONST encoding
   Sub-bucket:
     └── values column (INT32, M entries)      ← DICT/CONST encoding
   
   Nested ARRAY<ARRAY<INT>>:
     Main bucket: outer lengths (INT32)
     Sub-bucket:  inner lengths (INT32) + sub-sub-bucket: leaf values (INT32)
   ```
   
   ## Test plan
   - [x] Rust: 7 array tests (basic, null elements, strings, nested, all-null, 
mixed columns, 1000-row stress)
   - [x] Java: 1 array roundtrip test with ListVector
   - [x] Python: 4 array tests (basic, null elements, strings, nested)
   - [x] C++: 3 array tests (basic, null elements, strings)
   - [x] Full existing test suite passes (226 Rust + 27 Java + 35 Python)
   - [x] Clippy: 0 warnings
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to