Ox0400 commented on issue #35013: URL: https://github.com/apache/beam/issues/35013#issuecomment-2896874107
You're absolutely right—it's impossible to set the log level in DataflowPythonJobOp. The only workaround is to configure it in data_clean.py, which is what I did by adding: logging.getLogger('apache_beam.runners.dataflow.internal.apiclient').setLevel(logging.INFO) However, this solution is far from ideal: Manual Overhead – Every script must explicitly set this, adding unnecessary boilerplate. Undocumented Pitfall – There’s zero documentation warning about this requirement. Debugging this trivial issue took me days, only to discover it was caused by insufficient log levels preventing the parent process from retrieving the job ID. Key Flaws in Current Design: Poor Defaults – Logs critical for debugging (e.g., job IDs) should always be visible by default, not hidden behind manual configuration. Fragile UX – Even if documented, users might overlook this step, leading to avoidable failures. Adding a "warning" for a single line of code feels like a band-aid rather than a proper fix. Suggested Improvement: The framework should auto-enable INFO logs for critical components (like job tracking) or fail fast with clear errors if log filtering blocks essential data. Silent failures due to log levels are a developer nightmare. ## Certainly, there are two other options: - 1, Use print() directly. - 2, Set the log level of the current file's `LOGGER` to `INFO`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org