x-tong opened a new pull request, #2079:
URL: https://github.com/apache/auron/pull/2079
# Which issue does this PR close?
Partially addresses #1850 (Part 2a of the Flink RowData to Arrow conversion).
# Rationale for this change
Per AIP-1, the Flink integration data path requires converting Flink
`RowData` into Arrow `VectorSchemaRoot` for export to the native engine
(DataFusion/Rust). This PR implements the writer layer for basic types,
following Flink's official `flink-python` Arrow implementation as requested
during Part 1 review (#1959).
This is the mirror of #2063 (Arrow→RowData reader) — together they complete
the round-trip data path.
# What changes are included in this PR?
## Commit 1: ArrowFieldWriter base class + 12 type writers (16 files, +2181
lines)
- **`ArrowFieldWriter<IN>`** — Generic abstract base class using template
method pattern (`write()` → `doWrite()` + count++), aligned with Flink's
`flink-python` `ArrowFieldWriter`.
- **12 concrete writers** in `writers/` sub-package, each with
`forRow()`/`forArray()` dual-mode factory methods:
- Numeric: `IntWriter`, `TinyIntWriter`, `SmallIntWriter`, `BigIntWriter`,
`FloatWriter`, `DoubleWriter`
- Non-numeric: `BooleanWriter`, `VarCharWriter`, `VarBinaryWriter`,
`DecimalWriter`, `DateWriter`, `NullWriter`
- **Key design**: Each writer (except `NullWriter`) has two `public static
final` inner classes (`XxxWriterForRow` / `XxxWriterForArray`) because Flink's
`RowData` and `ArrayData` have no common getter interface.
- **Special cases**:
- `NullWriter`: No inner classes needed, `doWrite()` is empty (NullVector
values are inherently null)
- `DecimalWriter`: Takes precision/scale parameters, includes
`fitBigDecimal()` validation before writing (aligned with Flink's
`fromBigDecimal` logic)
- **Unit tests**: `IntWriterTest` (5), `BasicWritersTest` (20),
`NonNumericWritersTest` (12) — 37 tests
## Commit 2: FlinkArrowWriter orchestrator + factory methods (3 files, +482
lines)
- **`FlinkArrowWriter`** — Orchestrates per-column
`ArrowFieldWriter<RowData>[]` to write Flink `RowData` into Arrow
`VectorSchemaRoot`. Lifecycle: `create()` → `write(row)*` → `finish()` →
`reset()`.
- **Factory methods in `FlinkArrowUtils`** —
`createArrowFieldWriterForRow()`/`createArrowFieldWriterForArray()` dispatch
writer creation based on Arrow vector type (instanceof chain). Both are
package-private.
- **Integration tests**: `FlinkArrowWriterTest` (7) — all-types write, null
handling, multi-row batches, reset, empty batch, zero columns, unsupported
type. Total: **53 tests, all passing**.
# Scope
This PR covers basic types only. Time, Timestamp, and complex types
(Array/Map/Row) will be in Part 2b.
# Are there any user-facing changes?
No. Internal API for Flink integration.
# How was this patch tested?
53 tests across 4 test classes:
```bash
./build/mvn test -Pflink-1.18 -Pspark-3.5 -Pscala-2.12 \
-pl auron-flink-extension/auron-flink-runtime -am -DskipBuildNative
```
Result: 53 pass, 0 failures.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]