[
https://issues.apache.org/jira/browse/BEAM-13931?focusedWorklogId=725532&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-725532
]
ASF GitHub Bot logged work on BEAM-13931:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 12/Feb/22 00:35
Start Date: 12/Feb/22 00:35
Worklog Time Spent: 10m
Work Description: chamikaramj commented on a change in pull request
#16838:
URL: https://github.com/apache/beam/pull/16838#discussion_r805086385
##########
File path:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java
##########
@@ -985,6 +985,23 @@ public void deleteDataset(String projectId, String
datasetId)
// impossible to insert into BigQuery, and so we send it out through
the dead-letter
// queue.
if (nextRowSize >= maxRowBatchSize) {
+ Boolean isRetryAlways = false;
+ try {
+ // We verify whether the retryPolicy parameter is "retryAlways".
If it is, then
+ // it will return true. Otherwise it will return false, or it
may throw an NPE,
+ // which we need to catch and ignore.
+ isRetryAlways = retryPolicy.shouldRetry(null);
+ } catch (NullPointerException e) {
+ // We do not need to do anything about this exception, as it was
expected;
+ }
+ if (isRetryAlways) {
Review comment:
Instead of this, how about just letting this request go through and
letting BQ fail ? If it fails it will fail either way and letting BQ fail will
make sure that there's not regression.
##########
File path:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java
##########
@@ -985,6 +985,23 @@ public void deleteDataset(String projectId, String
datasetId)
// impossible to insert into BigQuery, and so we send it out through
the dead-letter
// queue.
if (nextRowSize >= maxRowBatchSize) {
+ Boolean isRetryAlways = false;
+ try {
+ // We verify whether the retryPolicy parameter is "retryAlways".
If it is, then
+ // it will return true. Otherwise it will return false, or it
may throw an NPE,
+ // which we need to catch and ignore.
+ isRetryAlways = retryPolicy.shouldRetry(null);
+ } catch (NullPointerException e) {
+ // We do not need to do anything about this exception, as it was
expected;
Review comment:
When would it throw an NPE ? If so I think you are probably not using
the spec of that method correctly.
##########
File path:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java
##########
@@ -985,6 +985,23 @@ public void deleteDataset(String projectId, String
datasetId)
// impossible to insert into BigQuery, and so we send it out through
the dead-letter
// queue.
if (nextRowSize >= maxRowBatchSize) {
+ Boolean isRetryAlways = false;
+ try {
+ // We verify whether the retryPolicy parameter is "retryAlways".
If it is, then
+ // it will return true. Otherwise it will return false, or it
may throw an NPE,
+ // which we need to catch and ignore.
+ isRetryAlways = retryPolicy.shouldRetry(null);
+ } catch (NullPointerException e) {
+ // We do not need to do anything about this exception, as it was
expected;
+ }
+ if (isRetryAlways) {
+ throw new RuntimeException(
+ String.format(
+ "We have observed a row that is %s bytes in size.
BigQuery supports"
+ + " request sizes up to 10MB. You can set the
pipeline option"
+ + " maxStreamingBatchSize to a larger number to
unblock this pipeline.",
+ nextRowSize));
+ }
errorContainer.add(
Review comment:
Now about only actively failing for rows that are larger than 10MB (BQ
RPC limit) which would surely fail when submitted ? For most users configured
"maxRowBatchSize" would be much less than actual BQ limit.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 725532)
Time Spent: 0.5h (was: 20m)
> BigQueryIO is sending rows that are too large to Deadletter Queue even on
> RETRY_ALWAYS
> --------------------------------------------------------------------------------------
>
> Key: BEAM-13931
> URL: https://issues.apache.org/jira/browse/BEAM-13931
> Project: Beam
> Issue Type: Bug
> Components: io-java-gcp
> Affects Versions: 2.35.0, 2.36.0
> Reporter: Pablo Estrada
> Assignee: Pablo Estrada
> Priority: P0
> Fix For: 2.37.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Note that BQ does not support requests over a certain size, and rows that go
> past the size may be output into a dead-letter queue that they can get back
> with
> [BigQueryIO.Write.Result.getFailedInsertsWithErr|https://beam.apache.org/releases/javadoc/2.36.0/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.html#getFailedInsertsWithErr--]
> A change went into Beam that outputs rows into the BQIO DLQ even if they're
> meant to be retried indefinitely.
> [https://github.com/apache/beam/commit/1f08d1f3ddc2e7bc7341be4b29bdafaec18de9cc#diff-26dbe8f625f702ae3edacdbc02b12acc6e423542fe16835229e22ef8eb4e109cR979-R989]
>
>
> A workaround is to set this pipeline option to a larger amount:
> [https://github.com/apache/beam/blob/1f08d1f3ddc2e7bc7341be4b29bdafaec18de9cc/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java#L70]
>
> Currently it's 64KB, which is relatively small. Setting it to 1MB or 5MB or
> so should work around this issue (it should be larger than the maximum row
> size) - gRPC should support up to 10MB request sizes.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)