The GitHub Actions job "Build and push images" on texera.git/main has failed.
Run started by GitHub user bobbai00 (triggered by bobbai00).

Head commit for run:
59f776c853d3b18a6e385ed9b27b7d4aafc835de / Meng Wang <[email protected]>
fix(workflow-operator): auto-rename empty CSV column headers (#5282)

### What changes were proposed in this PR?

A CSV with an empty column header (e.g. a trailing comma `id,name,,age`)
produced an `Attribute` with an empty name. After #3295 every output
port writes via `IcebergTableWriter → Parquet`, where Avro rejects empty
names with `IllegalArgumentException: Empty name` at flush time — losing
the operator port's entire result.

Rename blank header positions to `column-<index>` (the convention used
by pandas, Spark, R, and DuckDB) in all three CSV scan operators:
`CSVScanSourceOpDesc`, `ParallelCSVScanSourceOpDesc`, and
`CSVOldScanSourceOpDesc`. The issue named the first two; the legacy
`csvOld` variant is still registered in `LogicalOp` and had the same
latent bug.

Note: `ParallelCSVScanSourceOpDesc` is currently commented out of
`LogicalOp`'s operator registry (so it is not reachable from the UI),
but it is fixed here for consistency and so the bug does not resurface
if it is re-enabled for experiments.

### Any related issues, documentation, discussions?

Closes #5279.

### How was this PR tested?

Added three cases to `CSVScanSourceOpDescSpec` — one per operator —
feeding a CSV with an empty third header (`id,name,,age`) and asserting
the inferred schema is `["id", "name", "column-3", "age"]`.
`WorkflowOperator` suite is green (13 passed); scalafmt clean.

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-opus-4-7)

Report URL: https://github.com/apache/texera/actions/runs/26615494992

With regards,
GitHub Actions via GitBox

Reply via email to