[
https://issues.apache.org/jira/browse/BEAM-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974735#comment-16974735
]
Guenther Starnberger commented on BEAM-8651:
--------------------------------------------
I've attached a script to reproduce that issue. I'm not able to reproduce in on
Python 2.7.x and Python 3.7.3 (or higher), but it seems to be reproducible on
all of the earlier 3.7 releases, as well as all releases of Python 3.5 and 3.6.
To run the script (with a particular version of Python):
{{docker run -v "$PWD":/py -it python:3.7.2 python /py/beam8651.py 50}}
(The argument specifies the number of threads that are used.)
On Beam we've been only running into this issue with a --parallelism setting
larger than 1 so setting parallelism to 1 could be a potential workaround.
> Python 3 portable pipelines sometimes fail with errors in
> StockUnpickler.find_class()
> -------------------------------------------------------------------------------------
>
> Key: BEAM-8651
> URL: https://issues.apache.org/jira/browse/BEAM-8651
> Project: Beam
> Issue Type: Sub-task
> Components: sdk-py-core
> Reporter: Valentyn Tymofieiev
> Assignee: Valentyn Tymofieiev
> Priority: Major
> Attachments: beam8651.py
>
>
> Several Beam users [1,2] reported an error which happens on Python 3 in
> StockUnpickler.find_class.
> So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink
> and Dataflow runners. On Dataflow runner so far I have seen this in streaming
> pipelines only, which use portable SDK worker.
> Typical stack trace:
> {noformat}
> File
> "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py",
> line 1148, in _create_pardo_operation
> dofn_data = pickler.loads(serialized_fn)
>
> File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265,
> in loads
> return dill.loads(s)
>
> File "python3.5/site-packages/dill/_dill.py", line 317, in loads
>
> return load(file, ignore)
>
> File "python3.5/site-packages/dill/_dill.py", line 305, in load
>
> obj = pik.load()
>
> File "python3.5/site-packages/dill/_dill.py", line 474, in find_class
>
> return StockUnpickler.find_class(self, module, name)
>
> AttributeError: Can't get attribute 'ClassName' on <module 'ModuleName' from
> 'python3.5/site-packages/filename.py'>
> {noformat}
> According to Guenther from [1]:
> {quote}
> This looks exactly like a race condition that we've encountered on Python
> 3.7.1: There's a bug in some older 3.7.x releases that breaks the
> thread-safety of the unpickler, as concurrent unpickle threads can access a
> module before it has been fully imported. See
> https://bugs.python.org/issue34572 for more information.
> The traceback shows a Python 3.6 venv so this could be a different issue
> (the unpickle bug was introduced in version 3.7). If it's the same bug then
> upgrading to Python 3.7.3 or higher should fix that issue. One potential
> workaround is to ensure that all of the modules get imported during the
> initialization of the sdk_worker, as this bug only affects imports done by
> the unpickler.
> {quote}
> Opening this for visibility. Current open questions are:
> 1. Find a minimal example to reproduce this issue.
> 2. Figure out whether users are still affected by this issue on Python 3.7.3.
> 3. Communicate a workarounds for 3.5, 3.6 users affected by this.
> [1]
> https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E
--
This message was sent by Atlassian Jira
(v8.3.4#803005)