June Oh created BEAM-8906:
-----------------------------
Summary: Long BigQuery dry runs cause avalanche delay
Key: BEAM-8906
URL: https://issues.apache.org/jira/browse/BEAM-8906
Project: Beam
Issue Type: Bug
Components: io-java-gcp
Affects Versions: 2.16.0
Environment: Google Cloud Platform
Reporter: June Oh
Reproduction Steps:
1. Compose a BigQuery SELECT query that will take over 80 seconds for a dry run.
2. Run the query with Beam SDK's BigQueryIO.
3. Observe the 10+ minute delay before the actual query job is created.
When running readTableRows(), BigQueryIO attempts to estimate the query size by
performing a dry run, even if withoutValidation() is set. If the request takes
over 80 seconds (RetryHttpRequestInitializer.HANGING_GET_TIMEOUT_SEC),
RetryHttpRequestInitializer will time out and retry, up to 9 times
(BigQueryServicesImpl.MAX_RPC_RETRIES). Hence, once a dry run duration crosses
the 80 second tipping point, it causes an inevitable avalanche of a 720-second
delay. Considering the fact that size estimation is not a requirement in
running the query [1], BigQueryIO should provide a way to circumvent the
redundant delay, especially in consideration of time-critical enterprise
workloads.
There can be several ways to address this:
- increasing the timeout threshold (which will still create a tipping point);
- preventing the dry run requests from retrying; or
- adding an option to skip the size estimation within serializeToCloudSource().
[1]
https://github.com/apache/beam/blob/2ec3b0495c191597c9a88830d25a2c360b3277e0/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/internal/CustomSources.java#L75
--
This message was sent by Atlassian Jira
(v8.3.4#803005)