[
https://issues.apache.org/jira/browse/BEAM-11208?focusedWorklogId=513812&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-513812
]
ASF GitHub Bot logged work on BEAM-11208:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 18/Nov/20 22:13
Start Date: 18/Nov/20 22:13
Worklog Time Spent: 10m
Work Description: kmjung commented on a change in pull request #13378:
URL: https://github.com/apache/beam/pull/13378#discussion_r526457024
##########
File path:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageStreamSource.java
##########
@@ -178,6 +179,7 @@ private BigQueryStorageStreamReader(
this.parseFn = source.parseFn;
this.storageClient = source.bqServices.getStorageClient(options);
this.tableSchema = fromJsonString(source.jsonTableSchema,
TableSchema.class);
+ this.splitPossible = true;
Review comment:
Nit: I would just initialize this in the declaration above (e.g.
```private boolean splitPossible = true;```) rather than in the constructor.
##########
File path:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageStreamSource.java
##########
@@ -288,78 +290,85 @@ public synchronized void close() {
return null;
}
- SplitReadStreamRequest splitRequest =
- SplitReadStreamRequest.newBuilder()
- .setName(source.readStream.getName())
- .setFraction((float) fraction)
- .build();
-
- SplitReadStreamResponse splitResponse =
storageClient.splitReadStream(splitRequest);
- if (!splitResponse.hasPrimaryStream() ||
!splitResponse.hasRemainderStream()) {
- // No more splits are possible!
- Metrics.counter(
- BigQueryStorageStreamReader.class,
- "split-at-fraction-calls-failed-due-to-impossible-split-point")
- .inc();
- LOG.info(
- "BigQuery Storage API stream {} cannot be split at {}.",
- source.readStream.getName(),
- fraction);
- return null;
- }
+ if (splitPossible) {
Review comment:
Nit: I would structure this check as an early return (e.g. ```if
(!splitPossible) { return null; }```) rather than putting the whole
implementation into a new block.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 513812)
Remaining Estimate: 167h 40m (was: 167h 50m)
Time Spent: 20m (was: 10m)
> BigQuery storage streams fail with QUOTA_EXCEEDED errors in split
> -----------------------------------------------------------------
>
> Key: BEAM-11208
> URL: https://issues.apache.org/jira/browse/BEAM-11208
> Project: Beam
> Issue Type: Bug
> Components: io-java-gcp
> Affects Versions: 2.25.0
> Reporter: Kenneth Jung
> Assignee: Vachan Shetty
> Priority: P2
> Original Estimate: 168h
> Time Spent: 20m
> Remaining Estimate: 167h 40m
>
> The BigQueryStorageStreamSource attempts to call
> [SplitReadStream](https://cloud.google.com/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1#bigqueryread)
> for each splitAtFraction call. However, the storage API
> [limits](https://cloud.google.com/bigquery/quotas#storage-limits) the number
> of control plane operations per minute for a given project, which can lead to
> the service being overloaded in the event of a large pipeline. The stream
> reader should not attempt to split a stream once it has learned that the
> stream can no longer be split (e.g. once a SplitReadStream call returns an
> empty response).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)