Tanishq1030 opened a new pull request, #37413:
URL: https://github.com/apache/beam/pull/37413
Fixes #19711
This PR addresses the issue where `step_id` (instruction ID) was
consistently missing or empty in worker logs generated during the
`DoFn.setup()` lifecycle method.
### Rationale
The `FnApiLogRecordHandler` relies on `statesampler` thread-local storage to
populate the `instruction_id` in log entries. Previously, the `BundleProcessor`
executed the `setup()` method for operations *before* the thread-local context
was fully initialized for that instruction, causing logs emitted during setup
to become orphaned (missing metadata).
### Changes
1. **`sdks/python/apache_beam/runners/worker/sdk_worker.py`**: Updated
`create_bundle_processor` to pass the active `instruction_id` into the
`BundleProcessor` constructor.
2. **`sdks/python/apache_beam/runners/worker/bundle_processor.py`**:
* Updated `__init__` to accept `instruction_id`.
* Added logic to manually inject the `instruction_id` into the
`statesampler` context specifically while iterating through operations to call
`op.setup()`.
3. **`sdks/python/apache_beam/runners/worker/log_handler.py`**: Updated
`emit()` to check `record.instruction_id` before falling back to thread-local
storage, ensuring explicitly injected IDs are respected.
### Verification
I verified this fix locally using a reproduction script which forces a log
during `setup()`.
* **Before fix:** Logs during `setup()` had `instruction_id: None`.
* **After fix:** Logs during `setup()` correctly display the
`instruction_id` (e.g., `bundle_...`).
------------------------
- [x] Mention the appropriate issue in your description (e.g. `fixes
#19711`).
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]