[
https://issues.apache.org/jira/browse/BEAM-13682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrià Arcarons updated BEAM-13682:
----------------------------------
Description:
We have a {+}Java{+}, {+}streaming{+}, pipeline that consumes messages from
Kafka and ingests them into BigQuery. We run this pipeline in Dataflow.
Recently, we enabled the `.withDynamicRead()` option in our KafkaIO source to
be able to auto-discover new partitions.
As a result, the pipeline won't load in Dataflow with the following (not very
informative) error:
{noformat}
"Workflow failed. Causes: Internal Issue (fc4aaa0289f1f666): 62242584:26".
{noformat}
This error happens before the actual job graph can be generated.
Google Support provided us with an internal Dataflow log message that can help
to narrow the problem:
{noformat}
"invalid_argument: Duplicate side input name. Side input names must be unique."
{noformat}
Some important pieces of information:
* The error only happens if the destination sink is BigQueryIO. We have tested
it by replacing BigQueryIO with TextIO and it works OK.
* The error doesn't happen if pipeline is run by DirectRunner; only
DataflowRunner.
* We have reproduced this error in Beam SDK versions 2.34 and 2.35.
I've written a small pipeline that reproduces the error. Please find it
attached in the ticket [^Main.java].
was:
We have a {+}Java{+}, {+}streaming{+}, pipeline that consumes messages from
Kafka and ingests them into BigQuery. We run this pipeline in Dataflow.
Recently, we enabled the `.withDynamicRead()` option in our KafkaIO source to
be able to auto-discover new partitions.
As a result, the pipeline won't load in Dataflow with the following (not very
informative) error:
{noformat}
"Workflow failed. Causes: Internal Issue (fc4aaa0289f1f666): 62242584:26".
{noformat}
This error happens before the actual job graph can be generated.
After contacting Google Support, they concluded that:
{noformat}
it is a bug in KafkaIO as ‘withDynamicReads is supposed to produce unique side
input but it is not what is doing in your case
{noformat}
They also provided us with an internal Dataflow log message that can help to
narrow the problem:
{noformat}
"invalid_argument: Duplicate side input name. Side input names must be unique."
{noformat}
Some important pieces of information:
* The error only happens if the destination sink is BigQueryIO. We have tested
it by replacing BigQueryIO with TextIO and it works OK.
* The error doesn't happen if pipeline is run by DirectRunner; only
DataflowRunner.
* We have reproduced this error in Beam SDK versions 2.34 and 2.35.
I've written a small pipeline that reproduces the error. Please find it
attached in the ticket [^Main.java].
> Pipeline fails to start when .withDynamicRead() is activated
> ------------------------------------------------------------
>
> Key: BEAM-13682
> URL: https://issues.apache.org/jira/browse/BEAM-13682
> Project: Beam
> Issue Type: Bug
> Components: io-java-gcp
> Affects Versions: 2.34.0, 2.35.0
> Reporter: Adrià Arcarons
> Priority: P2
> Attachments: Main.java
>
>
> We have a {+}Java{+}, {+}streaming{+}, pipeline that consumes messages from
> Kafka and ingests them into BigQuery. We run this pipeline in Dataflow.
> Recently, we enabled the `.withDynamicRead()` option in our KafkaIO source to
> be able to auto-discover new partitions.
> As a result, the pipeline won't load in Dataflow with the following (not very
> informative) error:
> {noformat}
> "Workflow failed. Causes: Internal Issue (fc4aaa0289f1f666): 62242584:26".
> {noformat}
> This error happens before the actual job graph can be generated.
> Google Support provided us with an internal Dataflow log message that can
> help to narrow the problem:
> {noformat}
> "invalid_argument: Duplicate side input name. Side input names must be
> unique."
> {noformat}
> Some important pieces of information:
> * The error only happens if the destination sink is BigQueryIO. We have
> tested it by replacing BigQueryIO with TextIO and it works OK.
> * The error doesn't happen if pipeline is run by DirectRunner; only
> DataflowRunner.
> * We have reproduced this error in Beam SDK versions 2.34 and 2.35.
>
> I've written a small pipeline that reproduces the error. Please find it
> attached in the ticket [^Main.java].
--
This message was sent by Atlassian Jira
(v8.20.1#820001)