June Oh created BEAM-8906:
-----------------------------

             Summary: Long BigQuery dry runs cause avalanche delay
                 Key: BEAM-8906
                 URL: https://issues.apache.org/jira/browse/BEAM-8906
             Project: Beam
          Issue Type: Bug
          Components: io-java-gcp
    Affects Versions: 2.16.0
         Environment: Google Cloud Platform
            Reporter: June Oh


Reproduction Steps:

1. Compose a BigQuery SELECT query that will take over 80 seconds for a dry run.
2. Run the query with Beam SDK's BigQueryIO.
3. Observe the 10+ minute delay before the actual query job is created.

When running readTableRows(), BigQueryIO attempts to estimate the query size by 
performing a dry run, even if withoutValidation() is set. If the request takes 
over 80 seconds (RetryHttpRequestInitializer.HANGING_GET_TIMEOUT_SEC), 
RetryHttpRequestInitializer will time out and retry, up to 9 times 
(BigQueryServicesImpl.MAX_RPC_RETRIES). Hence, once a dry run duration crosses 
the 80 second tipping point, it causes an inevitable avalanche of a 720-second 
delay. Considering the fact that size estimation is not a requirement in 
running the query [1], BigQueryIO should provide a way to circumvent the 
redundant delay, especially in consideration of time-critical enterprise 
workloads.

There can be several ways to address this:
- increasing the timeout threshold (which will still create a tipping point);
- preventing the dry run requests from retrying; or
- adding an option to skip the size estimation within serializeToCloudSource().

[1] 
https://github.com/apache/beam/blob/2ec3b0495c191597c9a88830d25a2c360b3277e0/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/internal/CustomSources.java#L75



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to