[ 
https://issues.apache.org/jira/browse/BEAM-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

D Bodych updated BEAM-8257:
---------------------------
    Description: 
I have a dataflow job processing data from pub/sub defined like this:

*read from pub/sub -> process (my function) -> group into day windows -> write 
to BQ*

I'm using *Write.Method.FILE_LOADS* because of bounded input.

My job works fine, processing lots of GBs of data but it fails and tries to 
retry forever when it gets to create another table. The job is meant to run 
continuously and create day tables on its own, it does fine on the first few 
ones but then gives me indefinitely:
{code:java}
Processing stuck in step 
write-bq/BatchLoads/SinglePartitionWriteTables/ParMultiDo(WriteTables) for at 
least 05h30m00s without outputting or completing in state finish{code}
Before this happens it also throws: 
{code:java}
Load job <job_id> failed, will retry: {"errorResult": {"message":"Not found: 
Table <name_of_table> was not found in location US","reason":"notFound"}
{code}
It is indeed a right error because this table doesn't exists. Problem is that 
the job should create it on its own because of defined option 
*CreateDisposition.CREATE_IF_NEEDED*.

The number of day tables that it creates correctly without a problem depends on 
number of workers. It seems that when some worker creates one table its 
*CreateDisposition* changes to *CREATE_NEVER* causing the problem, but it's 
only my guess.

The similar problem was reported here but without any definite answer:
 
https://issues.apache.org/jira/browse/BEAM-3772?focusedCommentId=16387609&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16387609

ProcessElement definition here seems to give some clues but I cannot really say 
how it works with multiple workers: 
[https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L138]

I use 2.15.0 Apache SDK.

  was:
I have a dataflow job processing data from pub/sub defined like this:

*read from pub/sub -> process (my function) -> group into day windows -> write 
to BQ*

I'm using *Write.Method.FILE_LOADS* because of bounded input.

My job works fine, processing lots of GBs of data but it fails and tries to 
retry forever when it gets to create another table. The job is meant to run 
continuously and create day tables on its own, it does fine on the first few 
ones but then gives me indefinitely:
{code:java}
Processing stuck in step 
write-bq/BatchLoads/SinglePartitionWriteTables/ParMultiDo(WriteTables) for at 
least 05h30m00s without outputting or completing in state finish{code}
Before this happens it also throws:

 
{code:java}
Load job <job_id> failed, will retry: {"errorResult": {"message":"Not found: 
Table <name_of_table> was not found in location US","reason":"notFound"}
{code}
It is indeed a right error because this table doesn't exists. Problem is that 
the job should create it on its own because of defined option 
*CreateDisposition.CREATE_IF_NEEDED*.

The number of day tables that it creates correctly without a problem depends on 
number of workers. It seems that when some worker creates one table its 
*CreateDisposition* changes to *CREATE_NEVER* causing the problem, but it's 
only my guess.

The similar problem was reported here but without any definite answer:
 
https://issues.apache.org/jira/browse/BEAM-3772?focusedCommentId=16387609&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16387609

ProcessElement definition here seems to give some clues but I cannot really say 
how it works with multiple workers: 
[https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L138]

I use 2.15.0 Apache SDK.


> BigQueryIO - only first day table can be created, despite having 
> CreateDisposition.CREATE_IF_NEEDED
> ---------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-8257
>                 URL: https://issues.apache.org/jira/browse/BEAM-8257
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>    Affects Versions: 2.15.0
>         Environment:  Dataflow streaming pipeline 
>            Reporter: D Bodych
>            Priority: Major
>
> I have a dataflow job processing data from pub/sub defined like this:
> *read from pub/sub -> process (my function) -> group into day windows -> 
> write to BQ*
> I'm using *Write.Method.FILE_LOADS* because of bounded input.
> My job works fine, processing lots of GBs of data but it fails and tries to 
> retry forever when it gets to create another table. The job is meant to run 
> continuously and create day tables on its own, it does fine on the first few 
> ones but then gives me indefinitely:
> {code:java}
> Processing stuck in step 
> write-bq/BatchLoads/SinglePartitionWriteTables/ParMultiDo(WriteTables) for at 
> least 05h30m00s without outputting or completing in state finish{code}
> Before this happens it also throws: 
> {code:java}
> Load job <job_id> failed, will retry: {"errorResult": {"message":"Not found: 
> Table <name_of_table> was not found in location US","reason":"notFound"}
> {code}
> It is indeed a right error because this table doesn't exists. Problem is that 
> the job should create it on its own because of defined option 
> *CreateDisposition.CREATE_IF_NEEDED*.
> The number of day tables that it creates correctly without a problem depends 
> on number of workers. It seems that when some worker creates one table its 
> *CreateDisposition* changes to *CREATE_NEVER* causing the problem, but it's 
> only my guess.
> The similar problem was reported here but without any definite answer:
>  
> https://issues.apache.org/jira/browse/BEAM-3772?focusedCommentId=16387609&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16387609
> ProcessElement definition here seems to give some clues but I cannot really 
> say how it works with multiple workers: 
> [https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L138]
> I use 2.15.0 Apache SDK.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to