[ 
https://issues.apache.org/jira/browse/BEAM-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903189#comment-16903189
 ] 

James Hutchison commented on BEAM-7930:
---------------------------------------

If this isn't already a known issue I can try to provide more information.

> bundle_processor log spam using python SDK on dataflow runner
> -------------------------------------------------------------
>
>                 Key: BEAM-7930
>                 URL: https://issues.apache.org/jira/browse/BEAM-7930
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>    Affects Versions: 2.13.0
>            Reporter: James Hutchison
>            Priority: Minor
>
> When running my pipeline on dataflow, I can see in the stackdriver logs a 
> large amount of spam for the following messages (note that the numbers in 
> them change every message):
>  * [INFO] (bundle_processor.create_operation) No unique name set for 
> transform generatedPtransform-67
>  * [INFO] (bundle_processor.create_operation) No unique name for transform -19
>  * [ERROR] (bundle_processor.create) Missing required coder_id on grpc_port 
> for -19; using deprecated fallback.
> I tried using a breakpoint on where these log messages originate using the 
> direct runner and it never hit it, so I don't know specifically what is 
> causing them.
> I also tried using the logging module to change the threshold and also mocked 
> out the logging attribute in the bundle_processor module to change the log 
> level to CRITICAL and I still see the log messages.
> The pipeline is a streaming pipeline that reads from two pubsub topics, 
> merges the inputs and runs distinct on the inputs over each processing time 
> window, fetches from an external service, does processing, and inserts into 
> elasticsearch with failures going into bigquery. I notice the log messages 
> seem to cluster and this appears early on before any other log messages in 
> any of the other steps so I wonder if maybe this is coming from the pubsub 
> read or windowing portion.
> Expected behavior:
>  * I don't expect to see these noisy log messages which seem to indicate 
> something is wrong
>  * The missing required coder_id message is at the ERROR log level so it 
> pollutes the error logs. I would expect this to be at the WARNING or INFO 
> level.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to