Juan Urrego created BEAM-7693:
---------------------------------
Summary: FILE_LOADS option for inserting rows in BigQuery creates
a stuck process in Dataflow that saturates all the resources of the Job
Key: BEAM-7693
URL: https://issues.apache.org/jira/browse/BEAM-7693
Project: Beam
Issue Type: Bug
Components: io-java-files
Affects Versions: 2.13.0
Environment: Dataflow
Reporter: Juan Urrego
During a Stream Job, when you insert records to BigQuery in batch using the
FILE_LOADS option and one of the jobs fail, the thread who failed is getting
stuck and eventually it saturates the Job resources, making the autoscaling
option useless (uses the max number of workers and the system latency always
goes up). In some cases it has become ridiculous slow trying to process the
incoming events.
Here is an example:
{code:java}
BigQueryIO.writeTableRows()
.to(destinationTableSerializableFunction)
.withMethod(Method.FILE_LOADS)
.withJsonSchema(tableSchema)
.withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(WriteDisposition.WRITE_APPEND)
.withTriggeringFrequency(Duration.standardMinutes(5))
.withNumFileShards(25);
{code}
The pipeline works like a charm, but in the moment that I send a wrong tableRow
(for instance a required value as null) the pipeline starts sending this
messages:
{code:java}
Processing stuck in step FILE_LOADS: <StepName> in
BigQuery/BatchLoads/SinglePartitionWriteTables/ParMultiDo(WriteTables) for at
least 10m00s without outputting or completing in state finish at
java.lang.Thread.sleep(Native Method) at
com.google.api.client.util.Sleeper$1.sleep(Sleeper.java:42) at
com.google.api.client.util.BackOffUtils.next(BackOffUtils.java:48) at
org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJobManager.nextBackOff(BigQueryHelpers.java:159)
at
org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJobManager.waitForDone(BigQueryHelpers.java:145)
at
org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn.finishBundle(WriteTables.java:255)
at
org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn$DoFnInvoker.invokeFinishBundle(Unknown
Source)
{code}
It's clear that the step keeps running even when it failed. The BigQuery Job
mentions that the task failed, but DataFlow keeps trying to wait for a
response, even when the job is never executed again. At the same time, no
message is sent to the DropInputs step, even when I created my own step for
DeadLetter, the process think that it hasn't failed yet.
The only option that I have found so far, is to pre validate all the fields
before, but I was expecting the DB to do that for me, especially in some
extreme cases (like decimal numbers or constraint limitations). Please help
fixing this issue, otherwise the batch option in stream jobs is almost useless,
because I can't trust the own library to manage dead letters properly
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)