[PR] [AURON #1850] Add ArrowFieldWriter and FlinkArrowWriter for basic types [auron]

via GitHub Mon, 09 Mar 2026 07:37:54 -0700


x-tong opened a new pull request, #2079:
URL: https://github.com/apache/auron/pull/2079


   # Which issue does this PR close?
   
   Partially addresses #1850 (Part 2a of the Flink RowData to Arrow conversion).
   
   # Rationale for this change
   
   Per AIP-1, the Flink integration data path requires converting Flink 
`RowData` into Arrow `VectorSchemaRoot` for export to the native engine 
(DataFusion/Rust). This PR implements the writer layer for basic types, 
following Flink's official `flink-python` Arrow implementation as requested 
during Part 1 review (#1959).
   
   This is the mirror of #2063 (Arrow→RowData reader) — together they complete 
the round-trip data path.
   
   # What changes are included in this PR?
   
   ## Commit 1: ArrowFieldWriter base class + 12 type writers (16 files, +2181 
lines)
   
   - **`ArrowFieldWriter<IN>`** — Generic abstract base class using template 
method pattern (`write()` → `doWrite()` + count++), aligned with Flink's 
`flink-python` `ArrowFieldWriter`.
   - **12 concrete writers** in `writers/` sub-package, each with 
`forRow()`/`forArray()` dual-mode factory methods:
     - Numeric: `IntWriter`, `TinyIntWriter`, `SmallIntWriter`, `BigIntWriter`, 
`FloatWriter`, `DoubleWriter`
     - Non-numeric: `BooleanWriter`, `VarCharWriter`, `VarBinaryWriter`, 
`DecimalWriter`, `DateWriter`, `NullWriter`
   - **Key design**: Each writer (except `NullWriter`) has two `public static 
final` inner classes (`XxxWriterForRow` / `XxxWriterForArray`) because Flink's 
`RowData` and `ArrayData` have no common getter interface.
   - **Special cases**:
     - `NullWriter`: No inner classes needed, `doWrite()` is empty (NullVector 
values are inherently null)
     - `DecimalWriter`: Takes precision/scale parameters, includes 
`fitBigDecimal()` validation before writing (aligned with Flink's 
`fromBigDecimal` logic)
   - **Unit tests**: `IntWriterTest` (5), `BasicWritersTest` (20), 
`NonNumericWritersTest` (12) — 37 tests
   
   ## Commit 2: FlinkArrowWriter orchestrator + factory methods (3 files, +482 
lines)
   
   - **`FlinkArrowWriter`** — Orchestrates per-column 
`ArrowFieldWriter<RowData>[]` to write Flink `RowData` into Arrow 
`VectorSchemaRoot`. Lifecycle: `create()` → `write(row)*` → `finish()` → 
`reset()`.
   - **Factory methods in `FlinkArrowUtils`** — 
`createArrowFieldWriterForRow()`/`createArrowFieldWriterForArray()` dispatch 
writer creation based on Arrow vector type (instanceof chain). Both are 
package-private.
   - **Integration tests**: `FlinkArrowWriterTest` (7) — all-types write, null 
handling, multi-row batches, reset, empty batch, zero columns, unsupported 
type. Total: **53 tests, all passing**.
   
   # Scope
   
   This PR covers basic types only. Time, Timestamp, and complex types 
(Array/Map/Row) will be in Part 2b.
   
   # Are there any user-facing changes?
   
   No. Internal API for Flink integration.
   
   # How was this patch tested?
   
   53 tests across 4 test classes:
   
   ```bash
   ./build/mvn test -Pflink-1.18 -Pspark-3.5 -Pscala-2.12 \
     -pl auron-flink-extension/auron-flink-runtime -am -DskipBuildNative
   ```
   
   Result: 53 pass, 0 failures.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [AURON #1850] Add ArrowFieldWriter and FlinkArrowWriter for basic types [auron]

Reply via email to