Tanishq1030 commented on issue #19711:
URL: https://github.com/apache/beam/issues/19711#issuecomment-3793849315
# Root Cause Analysis: Instruction ID Missing in Lifecycle Methods
I have been investigating this issue and believe I've identified the root
cause, specifically regarding why this occurs in lifecycle methods like
`setup()` as noted by @hughack.
## Root Cause
The `FnApiLogRecordHandler` currently relies exclusively on
`statesampler.get_current_instruction_id()` to populate the `instruction_id`
field in log entries.
```python
# runners/worker/statesampler.py
_INSTRUCTION_IDS = threading.local()
```
The `statesampler` uses thread-local storage. During the execution of
`DoFn.setup()`, the worker's state sampler context appears to be either
uninitialized or not accessible on the thread executing the logging logic,
causing `get_current_instruction_id()` to return `None`.
## Proposed Fix
We can fix this by allowing the `instruction_id` to be explicitly carried by
the `LogRecord` object itself, rather than relying solely on the global
thread-local state. This allows us to inject the ID via a `LoggerAdapter` or
filter upstream in `sdk_worker.py` during lifecycle management.
I propose modifying `FnApiLogRecordHandler.emit` in
`sdks/python/apache_beam/runners/worker/log_handler.py` to prioritize an
attribute check:
```python
# In log_handler.py -> emit()
# Priority 1: Check if the record already carries the ID (via
LoggerAdapter/Context)
if hasattr(record, 'instruction_id'):
log_entry.instruction_id = record.instruction_id
# Priority 2: Fallback to thread-local storage (legacy/current behavior)
if not log_entry.instruction_id:
instruction_id = statesampler.get_current_instruction_id()
if instruction_id:
log_entry.instruction_id = instruction_id
```
## Benefits
This change is safe as it preserves the existing behavior while enabling the
`sdk_worker` to explicitly attach context to logs during `setup` and
`start_bundle`.
## Next Steps
I am preparing a PR to implement this change and the corresponding update in
`sdk_worker.py`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]