Tanishq1030 commented on issue #19711:
URL: https://github.com/apache/beam/issues/19711#issuecomment-3793849315

   # Root Cause Analysis: Instruction ID Missing in Lifecycle Methods
   
   I have been investigating this issue and believe I've identified the root 
cause, specifically regarding why this occurs in lifecycle methods like 
`setup()` as noted by @hughack.
   
   ## Root Cause
   
   The `FnApiLogRecordHandler` currently relies exclusively on 
`statesampler.get_current_instruction_id()` to populate the `instruction_id` 
field in log entries.
   
   ```python
   # runners/worker/statesampler.py
   _INSTRUCTION_IDS = threading.local()
   ```
   
   The `statesampler` uses thread-local storage. During the execution of 
`DoFn.setup()`, the worker's state sampler context appears to be either 
uninitialized or not accessible on the thread executing the logging logic, 
causing `get_current_instruction_id()` to return `None`.
   
   ## Proposed Fix
   
   We can fix this by allowing the `instruction_id` to be explicitly carried by 
the `LogRecord` object itself, rather than relying solely on the global 
thread-local state. This allows us to inject the ID via a `LoggerAdapter` or 
filter upstream in `sdk_worker.py` during lifecycle management.
   
   I propose modifying `FnApiLogRecordHandler.emit` in 
`sdks/python/apache_beam/runners/worker/log_handler.py` to prioritize an 
attribute check:
   
   ```python
   # In log_handler.py -> emit()
   
   # Priority 1: Check if the record already carries the ID (via 
LoggerAdapter/Context)
   if hasattr(record, 'instruction_id'):
       log_entry.instruction_id = record.instruction_id
   
   # Priority 2: Fallback to thread-local storage (legacy/current behavior)
   if not log_entry.instruction_id:
       instruction_id = statesampler.get_current_instruction_id()
       if instruction_id:
           log_entry.instruction_id = instruction_id
   ```
   
   ## Benefits
   
   This change is safe as it preserves the existing behavior while enabling the 
`sdk_worker` to explicitly attach context to logs during `setup` and 
`start_bundle`.
   
   ## Next Steps
   
   I am preparing a PR to implement this change and the corresponding update in 
`sdk_worker.py`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to