zeroshade opened a new pull request, #852:
URL: https://github.com/apache/arrow-go/pull/852
### Rationale for this change
Closes #826.
Across all 8 generated `*ColumnChunkWriter` types in
`parquet/file/column_writer_types.gen.go`, `WriteBatch` and `WriteBatchSpaced`
are an asymmetric pair:
- `WriteBatch` returns `(valueOffset int64, err error)` and opens with a
`defer recover()` that converts any panic into a returned `error` via
`utils.FormatRecoveredError`.
- `WriteBatchSpaced` returns nothing, has no `defer recover()`, and on the
`doBatches` (non-byte-array) path silently discards the `error` returned by
`commitWriteAndCheckPageLimit`.
So a write failure during a spaced write is either **silently discarded**
(numeric/boolean/fixed paths) or **escapes as a panic out of the calling
goroutine** (the byte-array branch's `panic(err)`, plus other internal panics
such as dictionary-fallback bookkeeping) — because, unlike `WriteBatch`, there
is no `recover()` to turn it into an `error`.
This is the same class of "panic escapes / error lost on the public API" bug
as #820 (fixed in #824), on a different code path. It is reachable through
`pqarrow` — `(*pqarrow.FileWriter)` drives `WriteBatchSpaced` for **nullable**
columns — so for consumers writing nullable columns to a network-attached sink
(GCS, S3 multipart, HDFS, etc.), a transient downstream failure on the spaced
path crashes the worker process or silently drops the error instead of
returning it.
### What changes are included in this PR?
1. **New method `WriteBatchSpacedWithError(values, defLevels, repLevels,
validBits, validBitsOffset) (int64, error)`** generated for all 8 column-writer
types from `column_writer_types.gen.go.tmpl`. It mirrors `WriteBatch`:
- opens with `defer func() { if r := recover(); r != nil { err =
utils.FormatRecoveredError(...) } }()`, and
- checks the `error` from `commitWriteAndCheckPageLimit` on **every**
branch (the `doBatches` branch previously ignored it), `panic`-ing so the
deferred recover surfaces it as a returned `error`.
2. **`WriteBatchSpaced` is left byte-for-byte unchanged** — same signature,
same behavior. This keeps the change **non-breaking and purely additive**,
exactly the approach #824 took with `NewParquetWriter` /
`NewParquetWriterWithError`. Existing callers of `WriteBatchSpaced` are
unaffected; callers that want error handling opt into
`WriteBatchSpacedWithError`.
3. **`pqarrow`'s dense-array encoder (`writeDenseArrow`) now calls
`WriteBatchSpacedWithError`** at all 12 spaced-write sites and propagates the
error through its existing `(err error)` return, so nullable-column writes
report sink failures as errors instead of crashing the goroutine or losing the
error.
The diff to the generated file and the template is additions-only (no
existing generated method bodies change).
### Are these changes tested?
Yes. New `TestWriteBatchSpacedWithErrorPropagatesWriteFailure` in
`parquet/file/column_writer_test.go` uses the existing `mockpagewriter` to fail
`WriteDataPage`, with a tiny `DataPageSize` and dictionary disabled to force a
flush mid-call. It asserts:
- `WriteBatchSpacedWithError` returns the wrapped failure (`errors.Is(err,
failureErr)`), rather than panicking or discarding it; and
- the legacy `WriteBatchSpaced` does **not** panic out of the call (its
original behavior is preserved).
The full `parquet/file/...` and `parquet/pqarrow/...` suites pass
(`PARQUET_TEST_DATA` pointed at `parquet-testing/data`), and `go vet` is clean.
### Are there any user-facing changes?
**No breaking changes.**
- `WriteBatchSpaced(...)` — signature and behavior unchanged.
- `WriteBatchSpacedWithError(...) (int64, error)` — new exported method on
each generated column-writer type. Adding exported methods is non-breaking in
Go.
- `pqarrow` public signatures are unchanged; the dense-array nullable write
path now returns transient sink failures as an `error` (callers already
checking `err` get strictly better behavior).
### Out of scope (separate follow-up)
`WriteBitmapBatchSpaced` (Boolean) has the same void-signature / no-recover
shape; it is not on the `WriteBatchSpaced` path this issue targets and can be
addressed separately if desired.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]