[
https://issues.apache.org/jira/browse/BEAM-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16951812#comment-16951812
]
Maximilian Michels commented on BEAM-8403:
------------------------------------------
Also adding the error which occurs in case of the race condition. The same
request id will be used twice and throw an error the second time it is removed
from the pending responses.
{noformat}
File
"/venv/lib/python3.6/site-packages/apache_beam/runners/worker/sdk_worker.py",
line 573, in pull_responses
future = self._responses_by_id.pop(response.id)
KeyError: KeyError('3884706',)
{noformat}
> Race condition in request id generation of GrpcStateRequestHandler
> ------------------------------------------------------------------
>
> Key: BEAM-8403
> URL: https://issues.apache.org/jira/browse/BEAM-8403
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-harness
> Reporter: Maximilian Michels
> Assignee: Maximilian Michels
> Priority: Major
> Fix For: 2.17.0
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> There is a race condition in {{GrpcStateRequestHandler}} which surfaced after
> the recent changes to process append/clear state request asynchronously. The
> race condition can occur if multiple Runner workers process a transform with
> state requests with the same SDK Harness. For example, this setup occurs with
> Flink when a TaskManager has multiple task slots and two or more of those
> slots process the same stateful stage against an SDK Harness.
> CC [~robertwb]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)