aglinxinyuan opened a new pull request, #5900:
URL: https://github.com/apache/texera/pull/5900

   ### What changes were proposed in this PR?
   
   Extends the cross-region **State materialization** format from a single 
`content` column to **4 columns** — `content`, `loop_counter`, `loop_start_id`, 
`loop_start_state_uri` — promoting loop bookkeeping to first-class columns 
(never inside the content JSON). The transport carries them end to end: the 
`OutputManager` state writer + `emit_state`, the Python network 
sender/receiver, the materialization reader, and the Scala `state.toTuple` call 
sites.
   
   **Dormant on `main`** — nothing observable changes without the loop 
operators:
   
   - `to_tuple()` / `toTuple()` and 
`OutputManager.save_state_to_storage_if_needed` / `emit_state` default the 
three loop columns to `0` / `""`, so every existing non-loop caller is 
unchanged.
   - `from_tuple` / `fromTuple` read only the `content` column, so round-trip 
identity is preserved and the extra columns are inert.
   
   No backward-compatible read of old 1-column State is needed: State 
materialization is **intra-execution only** — the iceberg state-document URI is 
execution-scoped (`…/eid/{executionId}/`) and recreated fresh each run, and 
State tuples are never replayed across executions or engine versions, so a 
1-column tuple can never reach the 4-column reader.
   
   This is the state-format prerequisite the loop operators build on; the 
columns carry non-default values only once Loop Start/End set them (follow-up 
PR).
   
   ### Any related issues, documentation, discussions?
   
   Extracted from #5700 (loop operators) per @Xiao-zhen-Liu's split request; 
part of #4442 ("Introduce for loop").
   
   ### How was this PR tested?
   
   - **Format / round-trip:** `test_state.py` (loop_counter is its own column, 
never in content JSON, defaults to 0), Scala `StateSpec`, `ArrowUtilsSpec` 
(4-column Arrow vector round-trip), `IcebergDocumentSpec` (iceberg state-doc 
round-trip).
   - **Transport:** `test_network_receiver.py`, 
`test_input_port_materialization_reader_runnable.py`, and 
`test_state_materialization_e2e.py` (hermetic sqlite catalog).
   - **Dormancy:** new 
`test_output_manager.py::test_defaults_loop_columns_when_omitted` pins that a 
no-loop caller (no `loop_counter`) still produces a valid 4-column tuple with 
the loop columns at `0` / `""`.
   - Local: `workflow-core` + `amber` compile; `StateSpec` + `ArrowUtilsSpec` 
pass; 26 Python state tests pass; scalafmt + scalafix + black clean. 
(`IcebergDocumentSpec` needs the iceberg catalog backend, so it runs in CI.)
   
   ### Was this PR authored or co-authored using generative AI tooling?
   
   Co-authored with Claude Opus 4.8 in compliance with ASF.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to