[
https://issues.apache.org/jira/browse/BEAM-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ismaël Mejía updated BEAM-7930:
-------------------------------
Component/s: sdk-py-core
> bundle_processor log spam using python SDK on dataflow runner
> -------------------------------------------------------------
>
> Key: BEAM-7930
> URL: https://issues.apache.org/jira/browse/BEAM-7930
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow, sdk-py-core
> Affects Versions: 2.13.0
> Reporter: James Hutchison
> Priority: Minor
>
> When running my pipeline on dataflow, I can see in the stackdriver logs a
> large amount of spam for the following messages (note that the numbers in
> them change every message):
> * [INFO] (bundle_processor.create_operation) No unique name set for
> transform generatedPtransform-67
> * [INFO] (bundle_processor.create_operation) No unique name for transform -19
> * [ERROR] (bundle_processor.create) Missing required coder_id on grpc_port
> for -19; using deprecated fallback.
> I tried running locally using the debugger and setting breakpoints on where
> these log messages originate using the direct runner and it never hit it, so
> I don't know specifically what is causing them.
> I also tried using the logging module to change the threshold and also mocked
> out the logging attribute in the bundle_processor module to change the log
> level to CRITICAL and I still see the log messages.
> The pipeline is a streaming pipeline that reads from two pubsub topics,
> merges the inputs and runs distinct on the inputs over each processing time
> window, fetches from an external service, does processing, and inserts into
> elasticsearch with failures going into bigquery. I notice the log messages
> seem to cluster and this appears early on before any other log messages in
> any of the other steps so I wonder if maybe this is coming from the pubsub
> read or windowing portion.
> Expected behavior:
> * I don't expect to see these noisy log messages which seem to indicate
> something is wrong
> * The missing required coder_id message is at the ERROR log level so it
> pollutes the error logs. I would expect this to be at the WARNING or INFO
> level.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)