Adrià Arcarons created BEAM-13682:
-------------------------------------

             Summary: Pipeline fails to start when .withDynamicRead() is 
activated
                 Key: BEAM-13682
                 URL: https://issues.apache.org/jira/browse/BEAM-13682
             Project: Beam
          Issue Type: Bug
          Components: io-java-gcp
    Affects Versions: 2.35.0, 2.34.0
            Reporter: Adrià Arcarons
         Attachments: Main.java

We have a {+}Java{+}, {+}streaming{+}, pipeline that consumes messages from 
Kafka and ingests them into BigQuery. We run this pipeline in Dataflow.

Recently, we enabled the `.withDynamicRead()` option in our KafkaIO source to 
be able to auto-discover new partitions.

As a result, the pipeline won't load in Dataflow with the following (not very 
informative) error:
{noformat}
"Workflow failed. Causes: Internal Issue (fc4aaa0289f1f666): 62242584:26". 
{noformat}
This error happens before the actual job graph can be generated.

After contacting Google Support, they concluded that:
{noformat}
it is a bug in KafkaIO as ‘withDynamicReads is supposed to produce unique side 
input but it is not what is doing in your case
{noformat}
They also provided us with an internal Dataflow log message that can help to 
narrow the problem:
{noformat}
"invalid_argument: Duplicate side input name.  Side input names must be unique."
{noformat}
Some important pieces of information:
 * The error only happens if the destination sink is BigQueryIO. We have tested 
it by replacing BigQueryIO with TextIO and it works OK.
 * The error doesn't happen if pipeline is run by DirectRunner; only 
DataflowRunner.
 * We have reproduced this error in Beam SDK versions 2.34 and 2.35.

 

I've written a small pipeline that reproduces the error. Please find it 
attached in the ticket.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to