aglinxinyuan commented on code in PR #4206:
URL: https://github.com/apache/texera/pull/4206#discussion_r3308859717
##########
amber/src/main/python/core/architecture/packaging/output_manager.py:
##########
@@ -217,6 +225,19 @@ def save_state_to_storage_if_needed(self, state: State,
port_id=None) -> None:
elif port_id in self._port_state_writers:
self._port_state_writers[port_id][0].put(element)
+ def reset_storage(self) -> None:
Review Comment:
Addressed in e6bea518f2. (The method is now `reset_output_storage` after an
earlier rename, and on the current branch it recreates just the one output
result table — the state table is handled separately in
`save_state_to_storage_if_needed`.)
* **Docstring**: it now says what the method does (drop + recreate the
single output table, bracketed by closing the old writer and opening a fresh
one), that it is called only by a Loop End worker once per iteration, and — the
reasoning that previously lived only in the PR description — *why* truncating
live storage is safe: a loop runs in MATERIALIZED mode, so downstream operators
don't read the table until the loop has finished, so no reader observes the
intermediate truncation.
* **Preconditions checked**: the two previously-implicit assumptions now
raise a clear `RuntimeError` instead of silently resetting the wrong port /
raising a bare `KeyError` — (1) exactly one output port, (2)
`set_up_port_storage_writer` already ran for it.
* **Tests**: new `test_output_manager.py` covers the happy path (recreate
bracketed by close→reopen) and both guard failures, with the iceberg/thread
collaborators mocked so it stays hermetic.
On the location/naming: it stays on `OutputManager` because it operates
entirely on that class's private writer/URI state; the docstring now makes the
single-caller, Loop-End-only usage explicit so the general location doesn't
mislead.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]