[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

Marco Veluscek (JIRA) Wed, 30 Jan 2019 02:23:49 -0800


    [ 
https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755958#comment-16755958
 ]


Marco Veluscek commented on BEAM-3772:
--------------------------------------

I have upgraded my code to Apache beam 2.9 and Scio 0.7.0 and tried again, but 
the same issue still remains.

 

The code I posted above did not change much with the upgrade.

In the Dataflow Job details on Google Cloud Console I can see the following 
details:
|*userAgent*|Apache_Beam_SDK_for_Java/2.9.0|
|*scioVersion*|0.7.0|
|*scalaVersion*|2.12.8|

Looking at the job history of BigQuery, I can see a successful load job to 
create the first table. As matter of fact, the first table get created and data 
inserted.

The second table does not get created. The job history shows several failed 
load jobs.

The first successful job looks like as follows:

!bigquery-success.png!

The failed job looks like as follows:

!bigquery-fail.png!

I should be clearer: I have tried with both the DirectRunner and the 
DataflowRunner. The problem only occurs with the DataflowRunner. Is this the 
right place to ask for help with the DataflowRunner? If not, could you point me 
to the right place?

Thank you.

> BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded 
> PCollection and FILE_LOADS
> --------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-3772
>                 URL: https://issues.apache.org/jira/browse/BEAM-3772
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>    Affects Versions: 2.2.0, 2.3.0
>         Environment: Dataflow streaming pipeline
>            Reporter: Benjamin BENOIST
>            Assignee: Chamikara Jayalath
>            Priority: Major
>         Attachments: bigquery-fail.png, bigquery-success.png
>
>
> My workflow : KAFKA -> Dataflow streaming -> BigQuery
> Given that having low-latency isn't important in my case, I use FILE_LOADS to 
> reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, 
> which is a table with the current hour as a suffix.
> This _BigQueryIO.Write_ is configured like this :
> {code:java}
> .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
> .withMethod(Method.FILE_LOADS)
> .withTriggeringFrequency(triggeringFrequency)
> .withNumFileShards(100)
> {code}
> The first table is successfully created and is written to. But then the 
> following tables are never created and I get these exceptions:
> {code:java}
> (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job 
> with id prefix 
> 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_00001_00023,
>  reached max retries: 3, last failed load job: {
>   "configuration" : {
>     "load" : {
>       "createDisposition" : "CREATE_NEVER",
>       "destinationTable" : {
>         "datasetId" : "dev_mydataset",
>         "projectId" : "myproject-id",
>         "tableId" : "mytable_20180302_16"
>       },
> {code}
> The _CreateDisposition_ used is _CREATE_NEVER_, contrary as 
> _CREATE_IF_NEEDED_ as specified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

Reply via email to