Ox0400 commented on issue #35013:
URL: https://github.com/apache/beam/issues/35013#issuecomment-2896874107

   You're absolutely right—it's impossible to set the log level in 
DataflowPythonJobOp. The only workaround is to configure it in data_clean.py, 
which is what I did by adding:
   
   
logging.getLogger('apache_beam.runners.dataflow.internal.apiclient').setLevel(logging.INFO)
   However, this solution is far from ideal:
   
   Manual Overhead – Every script must explicitly set this, adding unnecessary 
boilerplate.
   Undocumented Pitfall – There’s zero documentation warning about this 
requirement. Debugging this trivial issue took me days, only to discover it was 
caused by insufficient log levels preventing the parent process from retrieving 
the job ID.
   Key Flaws in Current Design:
   
   Poor Defaults – Logs critical for debugging (e.g., job IDs) should always be 
visible by default, not hidden behind manual configuration.
   Fragile UX – Even if documented, users might overlook this step, leading to 
avoidable failures. Adding a "warning" for a single line of code feels like a 
band-aid rather than a proper fix.
   Suggested Improvement:
   The framework should auto-enable INFO logs for critical components (like job 
tracking) or fail fast with clear errors if log filtering blocks essential 
data. Silent failures due to log levels are a developer nightmare.
   
   ## Certainly, there are two other options:
   
   - 1, Use print() directly.
   - 2, Set the log level of the current file's `LOGGER` to `INFO`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to