Xiao-zhen-Liu commented on code in PR #4424:
URL: https://github.com/apache/texera/pull/4424#discussion_r3139794598


##########
amber/src/main/python/core/runnables/main_loop.py:
##########
@@ -329,7 +330,7 @@ def _process_ecm(self, ecm_element: ECMElement):
 
             if ecm.ecm_type != EmbeddedControlMessageType.NO_ALIGNMENT:
                 self.context.pause_manager.resume(PauseType.ECM_PAUSE)
-
+            self._switch_context()

Review Comment:
   I think this `_switch_context()` can break ECM handling because it gives 
`DataProcessor` a chance to consume `current_internal_marker` before `MainLoop` 
checks it below.
   
   For `StartChannel` / `EndChannel`, the control handler sets 
`tuple_processing_manager.current_internal_marker`. But `DataProcessor.run()` 
reads it via `get_internal_marker()`, which also clears the field. So with this 
new switch, the sequence can become:
   
   1. MainLoop handles the ECM command and sets `current_internal_marker`.
   2. MainLoop switches to DataProcessor.
   3. DataProcessor calls `get_internal_marker()` and clears the marker.
   4. MainLoop resumes and sees `current_internal_marker is None`.
   5. MainLoop skips `_process_start_channel()` / `_process_end_channel()`.
   
   That is risky because those MainLoop methods do more than just invoke the 
executor hook: `_process_end_channel()` also emits finish state/tuples, sends 
port completion messages, forwards downstream ECMs, and completes the worker. 
DataProcessor consuming the marker first can therefore leave the lifecycle 
handling incomplete or stuck waiting for a matching MainLoop-side switch.
   
   It may be safer to avoid switching here before MainLoop has 
inspected/handled `current_internal_marker`, or to restructure the marker 
handoff so only one side is responsible for consuming it.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to