[
https://issues.apache.org/jira/browse/BEAM-7693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Juan Urrego updated BEAM-7693:
------------------------------
Attachment: Screenshot 2019-07-06 at 14.51.36.png
> FILE_LOADS option for inserting rows in BigQuery creates a stuck process in
> Dataflow that saturates all the resources of the Job
> --------------------------------------------------------------------------------------------------------------------------------
>
> Key: BEAM-7693
> URL: https://issues.apache.org/jira/browse/BEAM-7693
> Project: Beam
> Issue Type: Bug
> Components: io-java-files
> Affects Versions: 2.13.0
> Environment: Dataflow
> Reporter: Juan Urrego
> Priority: Major
> Attachments: Screenshot 2019-07-06 at 14.51.36.png
>
>
> During a Stream Job, when you insert records to BigQuery in batch using the
> FILE_LOADS option and one of the jobs fail, the thread who failed is getting
> stuck and eventually it saturates the Job resources, making the autoscaling
> option useless (uses the max number of workers and the system latency always
> goes up). In some cases it has become ridiculous slow trying to process the
> incoming events.
> Here is an example:
> {code:java}
> BigQueryIO.writeTableRows()
> .to(destinationTableSerializableFunction)
> .withMethod(Method.FILE_LOADS)
> .withJsonSchema(tableSchema)
> .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
> .withWriteDisposition(WriteDisposition.WRITE_APPEND)
> .withTriggeringFrequency(Duration.standardMinutes(5))
> .withNumFileShards(25);
> {code}
> The pipeline works like a charm, but in the moment that I send a wrong
> tableRow (for instance a required value as null) the pipeline starts sending
> this messages:
> {code:java}
> Processing stuck in step FILE_LOADS: <StepName> in
> BigQuery/BatchLoads/SinglePartitionWriteTables/ParMultiDo(WriteTables) for at
> least 10m00s without outputting or completing in state finish at
> java.lang.Thread.sleep(Native Method) at
> com.google.api.client.util.Sleeper$1.sleep(Sleeper.java:42) at
> com.google.api.client.util.BackOffUtils.next(BackOffUtils.java:48) at
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJobManager.nextBackOff(BigQueryHelpers.java:159)
> at
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJobManager.waitForDone(BigQueryHelpers.java:145)
> at
> org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn.finishBundle(WriteTables.java:255)
> at
> org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn$DoFnInvoker.invokeFinishBundle(Unknown
> Source)
> {code}
> It's clear that the step keeps running even when it failed. The BigQuery Job
> mentions that the task failed, but DataFlow keeps trying to wait for a
> response, even when the job is never executed again. At the same time, no
> message is sent to the DropInputs step, even when I created my own step for
> DeadLetter, the process think that it hasn't failed yet.
> The only option that I have found so far, is to pre validate all the fields
> before, but I was expecting the DB to do that for me, especially in some
> extreme cases (like decimal numbers or constraint limitations). Please help
> fixing this issue, otherwise the batch option in stream jobs is almost
> useless, because I can't trust the own library to manage dead letters properly
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)