[
https://issues.apache.org/jira/browse/BEAM-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joar Wandborg updated BEAM-7540:
--------------------------------
Description:
If you set {{save_main_session = True}} and have a logging.Logger instance in
your __main__ module, calling a logger method *after* Pipeline.run has been
called, the process will hang and never exit.
Python 3 Pipeline that reproduces the error (code also available at
https://gist.github.com/joar/f021db55eca4fa9e9fd7dfd67cc011b9):
{code:java}
import logging
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions, SetupOptions
_log = logging.getLogger(__name__)
def main(argv=None):
logging.basicConfig(level=logging.INFO)
pipeline_options = PipelineOptions(argv)
setup_options = pipeline_options.view_as(SetupOptions) # type: SetupOptions
setup_options.save_main_session = True
_log.info("Running pipeline")
with beam.Pipeline(runner="DirectRunner", options=pipeline_options) as p:
p | beam.Create(["hello", "world"]) | beam.Map(lambda x: print(x))
print("""
Call to _log.info will now deadlock, since the logging handler's
threading.RLock() has been passed through dill.
When you press Ctrl-C, the traceback should confirm that the process is
stuck at:
File "/usr/lib/python3.5/logging/__init__.py", line 810, in acquire
self.lock.acquire()
""")
_log.info("Pipeline done")
print("Launching nukes")
if __name__ == '__main__':
main()
{code}
I have opened an issue with {{dill}} as well:
[https://github.com/uqfoundation/dill/issues/321]
was:
If you set {{save_main_session = True}} and have a logging.Logger instance in
your __main__ module, calling a logger method *after* Pipeline.run has been
called, the process will hang and never exit.
Python 3 Pipeline that reproduces the error:
{code}
import logging
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions, SetupOptions
_log = logging.getLogger(__name__)
def main(argv=None):
logging.basicConfig(level=logging.INFO)
pipeline_options = PipelineOptions(argv)
setup_options = pipeline_options.view_as(SetupOptions) # type: SetupOptions
setup_options.save_main_session = True
_log.info("Running pipeline")
with beam.Pipeline(runner="DirectRunner", options=pipeline_options) as p:
p | beam.Create(["hello", "world"]) | beam.Map(lambda x: print(x))
print("""
Call to _log.info will now deadlock, since the logging handler's
threading.RLock() has been passed through dill.
When you press Ctrl-C, the traceback should confirm that the process is
stuck at:
File "/usr/lib/python3.5/logging/__init__.py", line 810, in acquire
self.lock.acquire()
""")
_log.info("Pipeline done")
print("Launching nukes")
if __name__ == '__main__':
main()
{code}
I have opened an issue with {{dill}} as well:
https://github.com/uqfoundation/dill/issues/321
> deadlock using save_main_session and logging
> --------------------------------------------
>
> Key: BEAM-7540
> URL: https://issues.apache.org/jira/browse/BEAM-7540
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Environment: Python 3.5
> Linux
> apache-beam 2.12.0
> Reporter: Joar Wandborg
> Priority: Major
>
> If you set {{save_main_session = True}} and have a logging.Logger instance in
> your __main__ module, calling a logger method *after* Pipeline.run has been
> called, the process will hang and never exit.
> Python 3 Pipeline that reproduces the error (code also available at
> https://gist.github.com/joar/f021db55eca4fa9e9fd7dfd67cc011b9):
> {code:java}
> import logging
> import apache_beam as beam
> from apache_beam.options.pipeline_options import PipelineOptions, SetupOptions
> _log = logging.getLogger(__name__)
> def main(argv=None):
> logging.basicConfig(level=logging.INFO)
> pipeline_options = PipelineOptions(argv)
> setup_options = pipeline_options.view_as(SetupOptions) # type:
> SetupOptions
> setup_options.save_main_session = True
> _log.info("Running pipeline")
> with beam.Pipeline(runner="DirectRunner", options=pipeline_options) as p:
> p | beam.Create(["hello", "world"]) | beam.Map(lambda x: print(x))
> print("""
> Call to _log.info will now deadlock, since the logging handler's
> threading.RLock() has been passed through dill.
>
> When you press Ctrl-C, the traceback should confirm that the process is
> stuck at:
>
> File "/usr/lib/python3.5/logging/__init__.py", line 810, in acquire
> self.lock.acquire()
> """)
> _log.info("Pipeline done")
> print("Launching nukes")
> if __name__ == '__main__':
> main()
> {code}
> I have opened an issue with {{dill}} as well:
> [https://github.com/uqfoundation/dill/issues/321]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)