aglinxinyuan opened a new pull request, #5900:
URL: https://github.com/apache/texera/pull/5900
### What changes were proposed in this PR?
Extends the cross-region **State materialization** format from a single
`content` column to **4 columns** — `content`, `loop_counter`, `loop_start_id`,
`loop_start_state_uri` — promoting loop bookkeeping to first-class columns
(never inside the content JSON). The transport carries them end to end: the
`OutputManager` state writer + `emit_state`, the Python network
sender/receiver, the materialization reader, and the Scala `state.toTuple` call
sites.
**Dormant on `main`** — nothing observable changes without the loop
operators:
- `to_tuple()` / `toTuple()` and
`OutputManager.save_state_to_storage_if_needed` / `emit_state` default the
three loop columns to `0` / `""`, so every existing non-loop caller is
unchanged.
- `from_tuple` / `fromTuple` read only the `content` column, so round-trip
identity is preserved and the extra columns are inert.
No backward-compatible read of old 1-column State is needed: State
materialization is **intra-execution only** — the iceberg state-document URI is
execution-scoped (`…/eid/{executionId}/`) and recreated fresh each run, and
State tuples are never replayed across executions or engine versions, so a
1-column tuple can never reach the 4-column reader.
This is the state-format prerequisite the loop operators build on; the
columns carry non-default values only once Loop Start/End set them (follow-up
PR).
### Any related issues, documentation, discussions?
Extracted from #5700 (loop operators) per @Xiao-zhen-Liu's split request;
part of #4442 ("Introduce for loop").
### How was this PR tested?
- **Format / round-trip:** `test_state.py` (loop_counter is its own column,
never in content JSON, defaults to 0), Scala `StateSpec`, `ArrowUtilsSpec`
(4-column Arrow vector round-trip), `IcebergDocumentSpec` (iceberg state-doc
round-trip).
- **Transport:** `test_network_receiver.py`,
`test_input_port_materialization_reader_runnable.py`, and
`test_state_materialization_e2e.py` (hermetic sqlite catalog).
- **Dormancy:** new
`test_output_manager.py::test_defaults_loop_columns_when_omitted` pins that a
no-loop caller (no `loop_counter`) still produces a valid 4-column tuple with
the loop columns at `0` / `""`.
- Local: `workflow-core` + `amber` compile; `StateSpec` + `ArrowUtilsSpec`
pass; 26 Python state tests pass; scalafmt + scalafix + black clean.
(`IcebergDocumentSpec` needs the iceberg catalog backend, so it runs in CI.)
### Was this PR authored or co-authored using generative AI tooling?
Co-authored with Claude Opus 4.8 in compliance with ASF.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]