zeroshade opened a new pull request, #824: URL: https://github.com/apache/arrow-go/pull/824
### Rationale for this change Closes #820. `(*file.Writer).startFile()` panics with the literal string `"failed to write magic number"` when the underlying sink's first `Write` call returns an error or short-writes. Because `file.NewParquetWriter` calls `startFile` synchronously inside the constructor and has no `error` return, callers had no idiomatic Go recovery path. The higher-level `pqarrow.NewFileWriter` already returns `(*FileWriter, error)`, but the panic propagated through it unchanged, so its `error` return was never reached for this failure mode. In production this manifests when the sink is a network-attached writer (`*storage.Writer` from GCS, an S3 multipart upload, an HDFS client, etc.). A transient network blip on the first 4-byte magic-header write reliably crashes the worker process, orphans any in-flight upload session, and forces a container restart. ### What changes are included in this PR? 1. **`(*Writer).startFile()` now returns `error`** instead of panicking on sink-write failures. The configuration-validation panic for "encrypted column not found in file schema" is preserved — that path is a programmer error, not an I/O failure. 2. **New public constructor `file.NewParquetWriterWithError(w, sc, opts...) (*Writer, error)`** is the preferred entry point for callers whose sink may transiently fail. 3. **`file.NewParquetWriter` is now a thin wrapper** over `NewParquetWriterWithError` that panics with the *exact same string* (`"failed to write magic number"`) on init failure. Any consumer with a `recover()` block that string-matches the panic value continues to work unchanged. 4. **`pqarrow.NewFileWriter` switches to `NewParquetWriterWithError`** internally. Its public signature is unchanged; its `error` return now meaningfully covers transient sink failures, with no API change for any of its callers. ### Are these changes tested? Yes. Six new tests in `parquet/file/file_writer_test.go` exercise the new behavior using a `flakyMagicSink` test helper modeled on the existing `errCloseWriter` pattern in the same file: - `TestNewParquetWriterWithError_Success` — happy path on a `bytes.Buffer`. - `TestNewParquetWriterWithError_FirstWriteFails` — first `Write` returns `io.ErrUnexpectedEOF`; constructor returns a wrapped error and does not panic. - `TestNewParquetWriterWithError_FirstWriteShortWrites` — first `Write` returns `n=len(p)-1, nil`; constructor returns an error mentioning the short write. - `TestNewParquetWriter_PreservesPanicMessage` — the legacy constructor still panics, the panic value is still a `string`, and the message is still the literal `"failed to write magic number"` (back-compat guard). - `TestPqarrowNewFileWriter_PropagatesInitError` — `pqarrow.NewFileWriter` returns the wrapped sink error rather than panicking. - `TestPqarrowNewFileWriter_PropagatesShortWrite` — same for the short-write case. The existing `TestCloseError`, full `parquet/file/...` and `parquet/pqarrow/...` test suites all pass (with `PARQUET_TEST_DATA` pointed at `parquet-testing/data`). ### Are there any user-facing changes? **No breaking changes.** - `file.NewParquetWriter(w, sc, opts...) *Writer` — signature unchanged, still panics on first-write failure with the identical literal string `"failed to write magic number"`. Existing `recover()` workarounds keep working. - `file.NewParquetWriterWithError(w, sc, opts...) (*Writer, error)` — new exported symbol. Adding an exported function is non-breaking in Go. - `pqarrow.NewFileWriter(...) (*FileWriter, error)` — signature unchanged. Its `error` return now also covers the magic-header-write failure that previously escaped as a panic. Callers who already check `err` get strictly better behavior. The new constructor's docstring is the recommended path for users on cloud-storage upload sinks. The legacy constructor's docstring now documents the panic behavior the original issue called out as undocumented. ### Out of scope (separate follow-up) While investigating, I noticed `WriteBatchSpaced` in `parquet/file/column_writer_types.gen.go` lacks the `defer recover()` that `WriteBatch` has, so an I/O error during a spaced write panics out of the calling goroutine instead of returning an error. Same class of bug as #820 but a different code path; happy to file a separate issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
