[
https://issues.apache.org/jira/browse/BEAM-5514?focusedWorklogId=172828&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172828
]
ASF GitHub Bot logged work on BEAM-5514:
----------------------------------------
Author: ASF GitHub Bot
Created on: 06/Dec/18 20:50
Start Date: 06/Dec/18 20:50
Worklog Time Spent: 10m
Work Description: ihji commented on a change in pull request #7189:
[BEAM-5514] BigQueryIO doesn't handle quotaExceeded errors properly
URL: https://github.com/apache/beam/pull/7189#discussion_r239609349
##########
File path:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java
##########
@@ -736,17 +736,21 @@ public void deleteDataset(String projectId, String
datasetId)
try {
return insert.execute().getInsertErrors();
} catch (IOException e) {
- if (new ApiErrorExtractor().rateLimited(e)) {
+ if (ApiErrorExtractor.INSTANCE.rateLimited(e)) {
LOG.info("BigQuery insertAll exceeded rate limit,
retrying");
- try {
- sleeper.sleep(backoff1.nextBackOffMillis());
- } catch (InterruptedException interrupted) {
- throw new IOException(
- "Interrupted while waiting before retrying
insertAll");
- }
+ } else if (ApiErrorExtractor.INSTANCE
+ .getErrorMessage(e)
+ .startsWith("Quota exceeded")) {
Review comment:
AFAIK, the worker will fail and retry on all other errors after ten seconds
anyway. The question here is whether the given error needs to be silently (no
explicit error log) retried with exponential backoff or not. I think it makes
sense to use exponential backoff for `quota exceeded` and `rate limit exceeded`
errors since they are temporal and there's a high chance of getting resolved by
themselves in next few retrials. I'm not sure the same holds true for other
possible errors like `field size too large`, `unauthorized`, or `user project
missing`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 172828)
Time Spent: 1h 50m (was: 1h 40m)
> BigQueryIO doesn't handle quotaExceeded errors properly
> -------------------------------------------------------
>
> Key: BEAM-5514
> URL: https://issues.apache.org/jira/browse/BEAM-5514
> Project: Beam
> Issue Type: Bug
> Components: io-java-gcp
> Reporter: Kevin Peterson
> Assignee: Heejong Lee
> Priority: Major
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> When exceeding a streaming quota for BigQuery insertAll requests, BigQuery
> returns a 403 with reason "quotaExceeded".
> The current implementation of BigQueryIO does not consider this to be a rate
> limited exception, and therefore does not perform exponential backoff
> properly, leading to repeated calls to BQ.
> The actual error is in the
> [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739]
> class, which is called from
> [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263]
> to determine how to retry the failure.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)