James Hutchison created BEAM-7930:
-------------------------------------

             Summary: bundle_processor log spam using python SDK on dataflow 
runner
                 Key: BEAM-7930
                 URL: https://issues.apache.org/jira/browse/BEAM-7930
             Project: Beam
          Issue Type: Bug
          Components: runner-dataflow
    Affects Versions: 2.13.0
            Reporter: James Hutchison


When running my pipeline on dataflow, I can see in the stackdriver logs a large 
amount of spam for the following messages (note that the numbers in them change 
every message):
 * [INFO] (bundle_processor.create_operation) No unique name set for transform 
generatedPtransform-67
 * [INFO] (bundle_processor.create_operation) No unique name for transform -19
 * [ERROR] (bundle_processor.create) Missing required coder_id on grpc_port for 
-19; using deprecated fallback.

I tried using a breakpoint on where these log messages originate using the 
direct runner and it never hit it, so I don't know specifically what is causing 
them.

I also tried using the logging module to change the threshold and also mocked 
out the logging attribute in the bundle_processor module to change the log 
level to CRITICAL and I still see the log messages.

The pipeline is a streaming pipeline that reads from two pubsub topics, merges 
the inputs and runs distinct on the inputs over each processing time window, 
fetches from an external service, does processing, and inserts into 
elasticsearch with failures going into bigquery. I notice the log messages seem 
to cluster and this appears early on before any other log messages in any of 
the other steps so I wonder if maybe this is coming from the pubsub read or 
windowing portion.

Expected behavior:
 * I don't expect to see these noisy log messages which seem to indicate 
something is wrong
 * The missing required coder_id message is at the ERROR log level so it 
pollutes the error logs. I would expect this to be at the WARNING or INFO level.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to