James Hutchison created BEAM-7930: ------------------------------------- Summary: bundle_processor log spam using python SDK on dataflow runner Key: BEAM-7930 URL: https://issues.apache.org/jira/browse/BEAM-7930 Project: Beam Issue Type: Bug Components: runner-dataflow Affects Versions: 2.13.0 Reporter: James Hutchison
When running my pipeline on dataflow, I can see in the stackdriver logs a large amount of spam for the following messages (note that the numbers in them change every message): * [INFO] (bundle_processor.create_operation) No unique name set for transform generatedPtransform-67 * [INFO] (bundle_processor.create_operation) No unique name for transform -19 * [ERROR] (bundle_processor.create) Missing required coder_id on grpc_port for -19; using deprecated fallback. I tried using a breakpoint on where these log messages originate using the direct runner and it never hit it, so I don't know specifically what is causing them. I also tried using the logging module to change the threshold and also mocked out the logging attribute in the bundle_processor module to change the log level to CRITICAL and I still see the log messages. The pipeline is a streaming pipeline that reads from two pubsub topics, merges the inputs and runs distinct on the inputs over each processing time window, fetches from an external service, does processing, and inserts into elasticsearch with failures going into bigquery. I notice the log messages seem to cluster and this appears early on before any other log messages in any of the other steps so I wonder if maybe this is coming from the pubsub read or windowing portion. Expected behavior: * I don't expect to see these noisy log messages which seem to indicate something is wrong * The missing required coder_id message is at the ERROR log level so it pollutes the error logs. I would expect this to be at the WARNING or INFO level. -- This message was sent by Atlassian JIRA (v7.6.14#76016)