Brian Hulette created BEAM-12502:
------------------------------------

             Summary: ib.collect fails to materialize named DeferredDataFrame 
instances
                 Key: BEAM-12502
                 URL: https://issues.apache.org/jira/browse/BEAM-12502
             Project: Beam
          Issue Type: Bug
          Components: sdk-py-core
    Affects Versions: 2.30.0
            Reporter: Brian Hulette
            Assignee: Sam Rohde
             Fix For: 2.31.0


In the below example, note that we return an empty dataframe for 
{{ib.collect(deferred_df)}}, but {{ib.collect(to_dataframe(rows))}} works as 
expected.

{code}
In [1]: import numpy as np                                                      
                                    
   ...: import pandas as pd                                                     
                                    
   ...:
   ...: import apache_beam as beam                                              
                                                                                
                                                                        
   ...: from apache_beam import Create, Map                                     
                                                                                
                                                                        
   ...: from apache_beam.dataframe.convert import to_dataframe                  
                                                                                
                                                                        
   ...: from apache_beam.dataframe.convert import to_pcollection                
                                                                                
                                                                        
   ...: from apache_beam.runners.interactive.interactive_runner import 
InteractiveRunner                                                               
                                                                                
 
   ...: import apache_beam.runners.interactive.interactive_beam as ib           
                                                                                
                                                                        
                                                                                
                                                                                
                                                                        
In [2]: birds = [                                                               
                                                                                
                                                                        
   ...:     {                                                                   
                                                                                
                                                                        
   ...:       "name": "American crow",                                          
                                                                                
                                                                        
   ...:       "scientific_name": "Corvus brachyrhynchos",                       
                                                                                
                                                                        
   ...:       "order": "Passeriformes",                                         
                                                                                
                                                                        
   ...:       "family": "Corvidae"                                              
                                                                                
                                                                        
   ...:     },                                                                  
                                                                                
                                                                        
   ...:     {                                                                   
                                                                                
                                                                        
   ...:       "name": "Canada goose",                                           
                                                                                
                                                                        
   ...:       "scientific_name": "Branta canadensis",                           
                                                                                
                                                                        
   ...:       "order": "Anseriformes",                                          
                                                                                
                                                                        
   ...:       "family": "Anatidae"                                              
                                                                                
                                                                        
   ...:     },                                                                  
                                                                                
                                                                        
   ...:     {                                                                   
                                                                                
                                                                        
   ...:       "name": "mallard",                                                
                                                                                
                                                                        
   ...:       "scientific_name": "Anas platyrhynchos",                          
                                                                                
                                                                        
   ...:       "order": "Anseriformes",                                          
                                                                                
                                                                        
   ...:       "family": "Anatidae"                                              
                                                                                
                                                                        
   ...:     }                                                                   
                                                                                
                                                                        
   ...: ]                                                                       
                                                                                
                                                                        
                                                                                
                                                                                
                                                                        
In [3]: # create an interactive pipeline                                        
                                                                                
                                                                        
   ...: p = beam.Pipeline(InteractiveRunner())                                  
                                                                                
                                                                        
   ...:                                                                         
                                                                                
                                                                        
   ...:                                                                         
                                                                                
                                                                        
   ...: # create some pipeline data and map it to rows                          
                                                                                
                                                                        
   ...: rows = (p | "Create elements" >> Create(birds)                          
                                                                                
                                                                        
   ...:           | "To rows" >> Map(lambda bird: beam.Row(                     
                                    
   ...:               common_name=bird["name"],           
   ...:               scientific_name=bird["scientific_name"],                  
                                    
   ...:               order=bird["order"],                
   ...:               family=bird["family"])))            
WARNING:apache_beam.runners.interactive.interactive_environment:You have 
limited Interactive Beam features since your ipython kernel is not connected to 
any notebook frontend.


In [4]: ib.collect(rows)                                                        
                                                                                
                                                                        
'Processing...'                                                                 
                                                                                
                                                                        
'Done.'                                                                         
                                                                                
                                                                        
Out[4]:                                                                         
                                                                                
                                                                        
     common_name        scientific_name          order    family                
                                                                                
                                                                        
0  American crow  Corvus brachyrhynchos  Passeriformes  Corvidae                
                                                                                
                                                                        
1   Canada goose      Branta canadensis   Anseriformes  Anatidae                
                                                                                
                                                                        
2        mallard     Anas platyrhynchos   Anseriformes  Anatidae                
                                                                                
                                                                        
                                                                                
                                                                                
                                                                        
In [5]: ib.collect(to_dataframe(rows))                                          
                                                                                
                                                                        
'Processing...'                                                                 
                                                                                
                                                                        
'Done.'                                                                         
                                                                                
                                                                        
Out[5]:                                                                         
                                                                                
                                                                        
     common_name        scientific_name          order    family                
                                                                                
                                                                        
0  American crow  Corvus brachyrhynchos  Passeriformes  Corvidae                
                                                                                
                                                                        
1   Canada goose      Branta canadensis   Anseriformes  Anatidae                
                                                                                
                                                                        
2        mallard     Anas platyrhynchos   Anseriformes  Anatidae                
                                                                                
                                                                        
                                                                                
                                                                                
                                                                        
In [6]: deferred_df = to_dataframe(rows)                                        
                                                                                
                                                                        
                                                                                
                                                                                
                                                                        
In [7]: ib.collect(deferred_df)                                                 
                                                                                
                                                                        
'Processing...'                                                                 
                                                                                
                                                                        
'Done.'                                                                         
                                                                                
                                                                        
Out[7]:                                                                         
                                                                                
                                                                        
Empty DataFrame                                                                 
                                                                                
                                                                        
Columns: []                                                                     
                                                                                
                                                                        
Index: []       
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to