Yicong-Huang opened a new issue, #4705:
URL: https://github.com/apache/texera/issues/4705
### What happened?
`ExecutorManager` reuses fixed module names (`udf-v1`, `udf-v2`, ...) per
instance and registers them in `sys.modules` via `importlib.import_module`.
Each test fixture creates a fresh `ExecutorManager` with `executor_version =
0`, so every spec's first executor lands on the name `udf-v1`. The teardown
only closes the temp filesystem; it does not remove the entry from
`sys.modules` or pop the temp directory from `sys.path`.
When the second spec hits `udf-v1`, `load_executor_definition` enters the
cached branch:
```scala
# amber/src/main/python/core/architecture/managers/executor_manager.py
if module_name in sys.modules:
executor_module = importlib.import_module(module_name)
executor_module.__dict__.clear()
executor_module.__dict__["__name__"] = module_name
executor_module = importlib.reload(executor_module)
```
`reload()` then tries to refind the module by name. On CPython 3.11 (Linux,
ubuntu-latest), the path lookup occasionally still resolves to the previous
spec / loader cache instead of the freshly-written `udf-v1.py` in the new temp
filesystem. The cached class definition (`TestOperator` from
`core/architecture/managers/test_executor_manager.py`'s `SAMPLE_OPERATOR_CODE`)
leaks into the next spec's executor, and the new spec's assertions on the
expected class (e.g. `CountBatchOperator.count`) fail with an `AttributeError`.
### Repro
CI run:
https://github.com/apache/texera/actions/runs/25263695023/job/74074970899?pr=4636
— `backport (release/v1.1.0-incubating) / python (ubuntu-latest, 3.11)`.
Test order in that run:
1.
`core/architecture/managers/test_executor_manager.py::TestExecutorManager::test_accept_python_language_regular_operator`
— passes; loads `udf-v1` with `class TestOperator(UDFOperatorV2)` from
`SAMPLE_OPERATOR_CODE`.
2. (~30 specs later, alphabetical order)
3.
`core/runnables/test_main_loop.py::TestMainLoop::test_batch_dp_thread_can_process_batch`
— fixture `mock_initialize_batch_count_executor` sends
`OpExecWithCode(inspect.getsource(CountBatchOperator), "python")`. The handler
calls `executor_manager.initialize_executor(code, ...)`, which calls
`load_executor_definition(code)`. The new `ExecutorManager` starts at
`executor_version = 0` → generates `udf-v1`. The cached entry from step 1 wins.
4. The test eventually does `assert executor.count == 1` and gets
`AttributeError: 'TestOperator' object has no attribute 'count'`.
The same code on Python 3.10 / 3.12 / 3.13 of the same backport job, and on
the direct `build / python (3.11)` job for the same PR, both pass — the 3.11
importlib path on this particular fs/timing combination is what trips the
cache. PR #4636 (pip → uv install switch) does not introduce the bug; it merely
shifts transitive package versions and timing enough to change the latent
collision rate.
### Branch
main (also reproducible on `release/v1.1.0-incubating`)
### Commit Hash (Optional)
8ce4ad511a9ac8fbcc2b37d5548513ae81029697
### Relevant log output
```
core/runnables/test_main_loop.py:851: AttributeError
> assert executor.count == 1
E AttributeError: 'TestOperator' object has no attribute 'count'
================== 1 failed, 218 passed, 5 warnings in 45.60s
==================
```
### Likely fix direction
The collision goes away if module names are unique per `ExecutorManager`
instance instead of starting at `udf-v1` every time. Two reasonable shapes:
- **Per-instance UUID prefix** — `module_name =
f"udf-{uuid.uuid4().hex}-v{version}"`. Names never collide across specs, the
clear+reload branch becomes unreachable in tests, and production behavior is
unchanged.
- **Lifecycle-aware close** — also pop `self.operator_module_name` from
`sys.modules` and remove the temp dir from `sys.path` in `close()`. Strictly
fewer leaks but still relies on every test path calling `close`.
A is the smaller, more defensive change.
### Out of scope
- Reproducing this with `uv` vs `pip`. The cause is the static module name
and `sys.modules` reuse; transitive package versions only affect the timing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]