[ 
https://issues.apache.org/jira/browse/BEAM-8397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958087#comment-16958087
 ] 

Valentyn Tymofieiev edited comment on BEAM-8397 at 10/23/19 6:31 PM:
---------------------------------------------------------------------

It appears that test_remote_runner_display_data triggers infinite recursion in 
_find_containing_class / _find_containing_class_inner methods.

This recursion is present in Python 2 and all supported Python 3 versions, 
however Python is not reliably detecting this recursion, and so the recursion 
causes crashes and/or goes unnoticed depending on Python version and 
sys.getrecursionlimit(). To observe that there is a recursion problem, we can 
add some logging into _find_containing_class_inner. With changes in [1] we have:
{noformat}
python ./setup.py nosetests --nocapture --tests 
'apache_beam/runners/dataflow/dataflow_runner_test.py:DataflowRunnerTest.test_remote_runner_display_data'
  

======================================================================
FAIL: test_remote_runner_display_data 
(apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/local/google/home/valentyn/projects/beam/beam3/beam/sdks/python/apache_beam/runners/dataflow/dataflow_runner_test.py",
 line 283, in test_remote_runner_display_data
    self.assertEqual(len(disp_data), 4)
AssertionError: 3 != 4                                                          
         (This is intentional, otherwise the test passes)
-------------------- >> begin captured logging << --------------------
...
root: WARNING: _find_containing_class_inner called 5312 times
root: WARNING: _find_containing_class_inner called 5313 times
root: WARNING: _find_containing_class_inner called 5314 times                   
         (This correlates with the increased number of recursion level set to 
5555 in [1])
--------------------- >> end captured logging << --------------------
...
{noformat}
I reproduced this further in the following counterexample:
{noformat}
from apache_beam.internal import pickler                                        
                                                                                
                                                                                
                                                                                
                                                                                
                         
                                                                                
   
class Outter():                                                                 
   
  def method(self):                                                             
   
    class InnerA(object):                                                       
   
      def __init__(self):                                                       
   
        pass                                                                    
   
                                                                                
   
    class InnerB(InnerA):                                                       
   
      def __init__(self):                                                       
   
        super(InnerB, self).__init__()                                          
   
                                                                                
   
    o = InnerB()                                                                
   
    pickler.loads(pickler.dumps(o))                                             
   
                                                                                
   
c = Outter()                                                                    
   
c.method()    
{noformat}
Running the above via python -m test fails (on Py2 and Py3.x) with:
{noformat}
RuntimeError: maximum recursion depth exceeded while getting the str of an 
object
{noformat}
I think the long-term course of action here should be removing patches[2] of 
dill (contributing the patches upstream, or switching to a different pickler: 
BEAM-8123).

In short term, to unblock BEAM-5878 we can remove nested inner classes from 
test_remote_runner_display_data and proceed with planned dill upgrade to 
0.3.1.1. As noted above, the upgrade itself does not trigger the test failure, 
as the infinite recursion is already happening but for some reason in most 
cases the test does not fail. It would be nice also to file an issue against 
CPython to improve detection of infinite recursion, with a repro they can 
follow.

cc: [~robertwb] [~altay] [~udim] FYI.

 

[1] [https://github.com/apache/beam/pull/9859]

[2] 
https://github.com/apache/beam/blob/eb40876109a557586c33b77923025508712184ab/sdks/python/apache_beam/internal/pickler.py#L126


was (Author: tvalentyn):
It appears that test_remote_runner_display_data triggers infinite recursion in 
_find_containing_class / _find_containing_class_inner methods.

This recursion is present in Python 2 and all supported Python 3 versions, 
however Python is not reliably detecting this recursion, and so the recursion 
causes crashes and/or goes unnoticed depending on Python version and 
sys.getrecursionlimit(). To observe that there is a recursion problem, we can 
add some logging into _find_containing_class_inner. With changes in [1] we have:
{noformat}
python ./setup.py nosetests --nocapture --tests 
'apache_beam/runners/dataflow/dataflow_runner_test.py:DataflowRunnerTest.test_remote_runner_display_data'
  

======================================================================
FAIL: test_remote_runner_display_data 
(apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/usr/local/google/home/valentyn/projects/beam/beam3/beam/sdks/python/apache_beam/runners/dataflow/dataflow_runner_test.py",
 line 283, in test_remote_runner_display_data
    self.assertEqual(len(disp_data), 4)
AssertionError: 3 != 4                                                          
         (This is intentional, otherwise the test passes)
-------------------- >> begin captured logging << --------------------
...
root: WARNING: _find_containing_class_inner called 5312 times
root: WARNING: _find_containing_class_inner called 5313 times
root: WARNING: _find_containing_class_inner called 5314 times                   
         (This correlates with the increased number of recursion level set to 
5555 in [1])
--------------------- >> end captured logging << --------------------
...
{noformat}
I reproduced this further in the following counterexample:
{noformat}
from apache_beam.internal import pickler                                        
                                                                                
                                                                                
                                                                                
                                                                                
                         
                                                                                
   
class Outter():                                                                 
   
  def method(self):                                                             
   
    class InnerA(object):                                                       
   
      def __init__(self):                                                       
   
        pass                                                                    
   
                                                                                
   
    class InnerB(InnerA):                                                       
   
      def __init__(self):                                                       
   
        super(InnerB, self).__init__()                                          
   
                                                                                
   
    o = InnerB()                                                                
   
    pickler.loads(pickler.dumps(o))                                             
   
                                                                                
   
c = Outter()                                                                    
   
c.method()    
{noformat}
Running the above via python -m test fails (on Py2 and Py3.x) with:
{noformat}
RuntimeError: maximum recursion depth exceeded while getting the str of an 
object
{noformat}
I think the long-term course of action here should be removing monkey patches 
of dill (contributing the patches upstream, or switching to a different 
pickler: BEAM-8123).

In short term, to unblock BEAM-5878 we can remove nested inner classes from 
test_remote_runner_display_data and proceed with planned dill upgrade to 
0.3.1.1. As noted above, the upgrade itself does not trigger the test failure, 
as the infinite recursion is already happening but for some reason in most 
cases the test does not fail. It would be nice also to file an issue against 
CPython to improve detection of infinite recursion, with a repro they can 
follow.

cc: [~robertwb] [~altay] [~udim] FYI.

 

[1] [https://github.com/apache/beam/pull/9859]

> DataflowRunnerTest.test_remote_runner_display_data fails due to infinite 
> recursion during pickling.
> ---------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-8397
>                 URL: https://issues.apache.org/jira/browse/BEAM-8397
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-py-core
>            Reporter: Valentyn Tymofieiev
>            Assignee: Valentyn Tymofieiev
>            Priority: Major
>
> `python ./setup.py test -s 
> apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest.test_remote_runner_display_data`
>  passes.
> `tox -e py37-gcp` passes if Beam depends on dill==0.3.0, but fails if Beam 
> depends on dill==0.3.1.1.`python ./setup.py nosetests --tests 
> 'apache_beam/runners/dataflow/dataflow_runner_test.py:DataflowRunnerTest.test_remote_runner_display_data`
>  fails currently if run on master.
> The failure indicates infinite recursion during pickling:
> {noformat}
> test_remote_runner_display_data 
> (apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest) ... 
> Fatal Python error: Cannot recover from stack overflow.
> Current thread 0x00007f9d700ed740 (most recent call first):
>   File "/usr/lib/python3.7/pickle.py", line 479 in get
>   File "/usr/lib/python3.7/pickle.py", line 497 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 910 in save_module_dict
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 198 in new_save_module_dict
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 114 in wrapper
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1137 in save_cell
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 910 in save_module_dict
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 198 in new_save_module_dict
> ...
> {noformat}
> cc: [~yoshiki.obata]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to