[ 
https://issues.apache.org/jira/browse/BEAM-7693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Urrego updated BEAM-7693:
------------------------------
    Attachment: Screenshot 2019-07-06 at 15.05.04.png

> FILE_LOADS option for inserting rows in BigQuery creates a stuck process in 
> Dataflow that saturates all the resources of the Job
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-7693
>                 URL: https://issues.apache.org/jira/browse/BEAM-7693
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-files
>    Affects Versions: 2.13.0
>         Environment: Dataflow
>            Reporter: Juan Urrego
>            Priority: Major
>         Attachments: Screenshot 2019-07-06 at 14.51.36.png, Screenshot 
> 2019-07-06 at 15.05.04.png, image-2019-07-06-15-04-17-593.png
>
>
> During a Stream Job, when you insert records to BigQuery in batch using the 
> FILE_LOADS option and one of the jobs fail, the thread who failed is getting 
> stuck and eventually it saturates the Job resources, making the autoscaling 
> option useless (uses the max number of workers and the system latency always 
> goes up). In some cases it has become ridiculous slow trying to process the 
> incoming events.
> Here is an example:
> {code:java}
> BigQueryIO.writeTableRows()
> .to(destinationTableSerializableFunction)
> .withMethod(Method.FILE_LOADS)
> .withJsonSchema(tableSchema)
> .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
> .withWriteDisposition(WriteDisposition.WRITE_APPEND)
> .withTriggeringFrequency(Duration.standardMinutes(5))
> .withNumFileShards(25);
> {code}
> The pipeline works like a charm, but in the moment that I send a wrong 
> tableRow (for instance a required value as null) the pipeline starts sending 
> this messages:
> {code:java}
> Processing stuck in step FILE_LOADS: <StepName> in 
> BigQuery/BatchLoads/SinglePartitionWriteTables/ParMultiDo(WriteTables) for at 
> least 10m00s without outputting or completing in state finish at 
> java.lang.Thread.sleep(Native Method) at 
> com.google.api.client.util.Sleeper$1.sleep(Sleeper.java:42) at 
> com.google.api.client.util.BackOffUtils.next(BackOffUtils.java:48) at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJobManager.nextBackOff(BigQueryHelpers.java:159)
>  at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJobManager.waitForDone(BigQueryHelpers.java:145)
>  at 
> org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn.finishBundle(WriteTables.java:255)
>  at 
> org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn$DoFnInvoker.invokeFinishBundle(Unknown
>  Source)
> {code}
> It's clear that the step keeps running even when it failed. The BigQuery Job 
> mentions that the task failed, but DataFlow keeps trying to wait for a 
> response, even when the job is never executed again. At the same time, no 
> message is sent to the DropInputs step, even when I created my own step for 
> DeadLetter, the process think that it hasn't failed yet.
> The only option that I have found so far, is to pre validate all the fields 
> before, but I was expecting the DB to do that for me, especially in some 
> extreme cases (like decimal numbers or constraint limitations). Please help 
> fixing this issue, otherwise the batch option in stream jobs is almost 
> useless, because I can't trust the own library to manage dead letters properly
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to