[
https://issues.apache.org/jira/browse/BEAM-5514?focusedWorklogId=172840&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172840
]
ASF GitHub Bot logged work on BEAM-5514:
----------------------------------------
Author: ASF GitHub Bot
Created on: 06/Dec/18 21:21
Start Date: 06/Dec/18 21:21
Worklog Time Spent: 10m
Work Description: chamikaramj commented on a change in pull request
#7189: [BEAM-5514] BigQueryIO doesn't handle quotaExceeded errors properly
URL: https://github.com/apache/beam/pull/7189#discussion_r239618701
##########
File path:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java
##########
@@ -736,17 +736,21 @@ public void deleteDataset(String projectId, String
datasetId)
try {
return insert.execute().getInsertErrors();
} catch (IOException e) {
- if (new ApiErrorExtractor().rateLimited(e)) {
+ if (ApiErrorExtractor.INSTANCE.rateLimited(e)) {
LOG.info("BigQuery insertAll exceeded rate limit,
retrying");
- try {
- sleeper.sleep(backoff1.nextBackOffMillis());
- } catch (InterruptedException interrupted) {
- throw new IOException(
- "Interrupted while waiting before retrying
insertAll");
- }
+ } else if (ApiErrorExtractor.INSTANCE
+ .getErrorMessage(e)
+ .startsWith("Quota exceeded")) {
Review comment:
Error code rateLimitExceeded is well documented and probably it will be less
brittle to check for that instead of text "Quota exceeded".
https://cloud.google.com/bigquery/troubleshooting-errors
Also, I think the main concern so far has been Beam sending large number of
messages to BQ even after BQ service raises quota exceeded errors. I think this
will be somewhat exacerbated by introducing exponential backoff here (more
messages before the 10 second failed workitem wait) so this has to be combined
with a solution where we perform exponential backoff across all BQ streaming
write threads started by a given workitme (which can be a separate PR).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 172840)
Time Spent: 2h 10m (was: 2h)
> BigQueryIO doesn't handle quotaExceeded errors properly
> -------------------------------------------------------
>
> Key: BEAM-5514
> URL: https://issues.apache.org/jira/browse/BEAM-5514
> Project: Beam
> Issue Type: Bug
> Components: io-java-gcp
> Reporter: Kevin Peterson
> Assignee: Heejong Lee
> Priority: Major
> Time Spent: 2h 10m
> Remaining Estimate: 0h
>
> When exceeding a streaming quota for BigQuery insertAll requests, BigQuery
> returns a 403 with reason "quotaExceeded".
> The current implementation of BigQueryIO does not consider this to be a rate
> limited exception, and therefore does not perform exponential backoff
> properly, leading to repeated calls to BQ.
> The actual error is in the
> [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739]
> class, which is called from
> [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263]
> to determine how to retry the failure.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)