Yicong-Huang opened a new pull request, #4717:
URL: https://github.com/apache/texera/pull/4717

   ### What changes were proposed in this PR?
   
   Fix the root cause of #4705: lift `ExecutorManager`'s tmp module-name 
counter from a per-instance attribute to a process-wide `itertools.count(1)`, 
so generated module names (`udf-vN`) never collide across instances. The `if 
module_name in sys.modules:` recovery branch (clear + reload) becomes 
unreachable and is removed.
   
   ```diff
    class ExecutorManager:
   +    _module_name_counter = itertools.count(1)
   +
        def __init__(self):
            self.executor: Optional[Operator] = None
            self.operator_module_name: Optional[str] = None
   -        self.executor_version: int = 0  # incremental only
   
        def gen_module_file_name(self):
   -        self.executor_version += 1
   -        module_name = f"udf-v{self.executor_version}"
   +        module_name = f"udf-v{next(ExecutorManager._module_name_counter)}"
            file_name = f"{module_name}.py"
            return module_name, file_name
   
        def load_executor_definition(self, code):
            module_name, file_name = self.gen_module_file_name()
            with self.fs.open(file_name, "w") as file:
                file.write(code)
   -        if module_name in sys.modules:
   -            executor_module = importlib.import_module(module_name)
   -            executor_module.__dict__.clear()
   -            executor_module.__dict__["__name__"] = module_name
   -            executor_module = importlib.reload(executor_module)
   -        else:
   -            executor_module = importlib.import_module(module_name)
   +        executor_module = importlib.import_module(module_name)
            self.operator_module_name = module_name
            ...
   ```
   
   ### Why this matters
   
   In production, a Python worker is its own subprocess and holds exactly one 
`ExecutorManager`; a single instance's counter is monotonic across 
`update_executor` calls, so `udf-v1`, `udf-v2`, … already never collided. The 
clear+reload branch was effectively test-only dead code.
   
   But under pytest, dozens of fixtures construct fresh `ExecutorManager`s in 
the same Python process. Each one started at `executor_version = 0` and 
immediately tried to load `udf-v1`. The second instance found `udf-v1` already 
in `sys.modules` from `test_executor_manager.py`'s `SAMPLE_OPERATOR_CODE` 
(which defines `class TestOperator`). The clear+reload recovery silently failed 
on Python 3.11 specifically and returned the stale `TestOperator` to whichever 
spec needed `CountBatchOperator`. The visible symptom alternated between 
`AttributeError: 'TestOperator' object has no attribute 'count'` and a 1-minute 
pytest hang.
   
   A process-wide counter makes module names unique across every instance in 
the same interpreter, so the `if module_name in sys.modules` branch is never 
taken in either production or tests, and the 3.11-specific reload path is never 
exercised.
   
   ### Out of scope
   
   - Lifecycle cleanup of `sys.path` / `sys.modules` in 
`ExecutorManager.close()`. Each `udf-vN` module still survives the manager that 
loaded it, but the worker process is short-lived and `/var/tmp` is cleaned by 
the OS.
   
   ### Any related issues, documentation, discussions?
   
   Closes #4705. Indirectly unblocks the recurring backport flakes seen on PRs 
#4636, #4648, and others (where the 3.11 leg either errored on `TestOperator` 
or hung in `DataProcessingSpec`).
   
   ### How was this PR tested?
   
   ```
   sbt -no-colors -batch \
     'WorkflowExecutionService / Test / testOnly \
        
org.apache.texera.amber.engine.architecture.managers.test_executor_manager \
        org.apache.texera.amber.engine.runnables.test_main_loop'
   # 23 passed
   ```
   
   The combined run that previously interleaved `TestOperator` from 
`test_executor_manager.py` into `test_main_loop.py`'s fixtures now passes 
cleanly: every spec sees its own freshly-imported module.
   
   `test_executor_manager.py` was updated for the new semantics:
   
   - `test_initialization` no longer asserts `executor_version == 0` (the 
attribute is gone).
   - `test_accept_python_language_*_operator` now assert 
`operator_module_name.startswith("udf-v")` instead of `== "udf-v1"` — the 
absolute value depends on prior tests in the same session.
   - `test_gen_module_file_name_increments` asserts relative monotonicity 
rather than absolute values.
   
   ### Was this PR authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (Opus 4.7, 1M context)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to