Ox0400 commented on issue #35013:
URL: https://github.com/apache/beam/issues/35013#issuecomment-2896874107
You're absolutely right—it's impossible to set the log level in
DataflowPythonJobOp. The only workaround is to configure it in data_clean.py,
which is what I did by adding:
logging.getLogger('apache_beam.runners.dataflow.internal.apiclient').setLevel(logging.INFO)
However, this solution is far from ideal:
Manual Overhead – Every script must explicitly set this, adding unnecessary
boilerplate.
Undocumented Pitfall – There’s zero documentation warning about this
requirement. Debugging this trivial issue took me days, only to discover it was
caused by insufficient log levels preventing the parent process from retrieving
the job ID.
Key Flaws in Current Design:
Poor Defaults – Logs critical for debugging (e.g., job IDs) should always be
visible by default, not hidden behind manual configuration.
Fragile UX – Even if documented, users might overlook this step, leading to
avoidable failures. Adding a "warning" for a single line of code feels like a
band-aid rather than a proper fix.
Suggested Improvement:
The framework should auto-enable INFO logs for critical components (like job
tracking) or fail fast with clear errors if log filtering blocks essential
data. Silent failures due to log levels are a developer nightmare.
## Certainly, there are two other options:
- 1, Use print() directly.
- 2, Set the log level of the current file's `LOGGER` to `INFO`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]