[
https://issues.apache.org/jira/browse/BEAM-11326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312674#comment-17312674
]
Kenneth Knowles commented on BEAM-11326:
----------------------------------------
Based on the PR being merged, I'm assuming this is fixed?
> Enforce deadlines during splitAtFraction in BigQueryStorageStreamSource
> -----------------------------------------------------------------------
>
> Key: BEAM-11326
> URL: https://issues.apache.org/jira/browse/BEAM-11326
> Project: Beam
> Issue Type: Bug
> Components: io-java-gcp
> Affects Versions: 2.25.0
> Reporter: Kenneth Jung
> Assignee: Kenneth Jung
> Priority: P2
> Fix For: 2.29.0
>
> Time Spent: 2.5h
> Remaining Estimate: 0h
>
> In the
> [BigQueryStorageStreamSource](https://github.com/apache/beam/blob/3bb232fb098700de408f574585dfe74bbaff7230/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageStreamSource.java#L279),
> we perform two RPCs during splitAtFraction: one to split the current stream
> into primary and residual child streams, and a second to validate the
> reader's current offset within the primary stream to ensure that the reader
> has not advanced beyond the split point during the split process. For
> sufficiently large streams -- particularly when combined with selective
> predicate filters -- this process can take longer than the 2 minute limit
> beyond which the Dataflow runtime will consider the worker to be lost and can
> ultimately cause pipeline execution failures.
> The short-term solution is to implement a consistent deadline for both RPCs
> which will fail the split operation if it takes too long. This does not
> address the potential for sub-optimal parallelism and dynamic work
> rebalancing, but it should at least prevent pipeline execution failures.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)