[ 
https://issues.apache.org/jira/browse/BEAM-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-12502:
---------------------------------
    Affects Version/s: 2.29.0

> ib.collect fails to materialize named DeferredDataFrame instances
> -----------------------------------------------------------------
>
>                 Key: BEAM-12502
>                 URL: https://issues.apache.org/jira/browse/BEAM-12502
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>    Affects Versions: 2.29.0, 2.30.0
>            Reporter: Brian Hulette
>            Assignee: Sam Rohde
>            Priority: P2
>              Labels: dataframe-api
>             Fix For: 2.31.0
>
>
> In the below example, note that we return an empty dataframe for 
> {{ib.collect(deferred_df)}}, but {{ib.collect(to_dataframe(rows))}} works as 
> expected.
> {code}
> In [1]: import numpy as np                                                    
>                                       
>    ...: import pandas as pd                                                   
>                                       
>    ...:
>    ...: import apache_beam as beam                                            
>                                                                               
>                                                                             
>    ...: from apache_beam import Create, Map                                   
>                                                                               
>                                                                             
>    ...: from apache_beam.dataframe.convert import to_dataframe                
>                                                                               
>                                                                             
>    ...: from apache_beam.dataframe.convert import to_pcollection              
>                                                                               
>                                                                             
>    ...: from apache_beam.runners.interactive.interactive_runner import 
> InteractiveRunner                                                             
>                                                                               
>      
>    ...: import apache_beam.runners.interactive.interactive_beam as ib         
>                                                                               
>                                                                             
>                                                                               
>                                                                               
>                                                                             
> In [2]: birds = [                                                             
>                                                                               
>                                                                             
>    ...:     {                                                                 
>                                                                               
>                                                                             
>    ...:       "name": "American crow",                                        
>                                                                               
>                                                                             
>    ...:       "scientific_name": "Corvus brachyrhynchos",                     
>                                                                               
>                                                                             
>    ...:       "order": "Passeriformes",                                       
>                                                                               
>                                                                             
>    ...:       "family": "Corvidae"                                            
>                                                                               
>                                                                             
>    ...:     },                                                                
>                                                                               
>                                                                             
>    ...:     {                                                                 
>                                                                               
>                                                                             
>    ...:       "name": "Canada goose",                                         
>                                                                               
>                                                                             
>    ...:       "scientific_name": "Branta canadensis",                         
>                                                                               
>                                                                             
>    ...:       "order": "Anseriformes",                                        
>                                                                               
>                                                                             
>    ...:       "family": "Anatidae"                                            
>                                                                               
>                                                                             
>    ...:     },                                                                
>                                                                               
>                                                                             
>    ...:     {                                                                 
>                                                                               
>                                                                             
>    ...:       "name": "mallard",                                              
>                                                                               
>                                                                             
>    ...:       "scientific_name": "Anas platyrhynchos",                        
>                                                                               
>                                                                             
>    ...:       "order": "Anseriformes",                                        
>                                                                               
>                                                                             
>    ...:       "family": "Anatidae"                                            
>                                                                               
>                                                                             
>    ...:     }                                                                 
>                                                                               
>                                                                             
>    ...: ]                                                                     
>                                                                               
>                                                                             
>                                                                               
>                                                                               
>                                                                             
> In [3]: # create an interactive pipeline                                      
>                                                                               
>                                                                             
>    ...: p = beam.Pipeline(InteractiveRunner())                                
>                                                                               
>                                                                             
>    ...:                                                                       
>                                                                               
>                                                                             
>    ...:                                                                       
>                                                                               
>                                                                             
>    ...: # create some pipeline data and map it to rows                        
>                                                                               
>                                                                             
>    ...: rows = (p | "Create elements" >> Create(birds)                        
>                                                                               
>                                                                             
>    ...:           | "To rows" >> Map(lambda bird: beam.Row(                   
>                                       
>    ...:               common_name=bird["name"],           
>    ...:               scientific_name=bird["scientific_name"],                
>                                       
>    ...:               order=bird["order"],                
>    ...:               family=bird["family"])))            
> WARNING:apache_beam.runners.interactive.interactive_environment:You have 
> limited Interactive Beam features since your ipython kernel is not connected 
> to any notebook frontend.
> In [4]: ib.collect(rows)                                                      
>                                                                               
>                                                                             
> 'Processing...'                                                               
>                                                                               
>                                                                             
> 'Done.'                                                                       
>                                                                               
>                                                                             
> Out[4]:                                                                       
>                                                                               
>                                                                             
>      common_name        scientific_name          order    family              
>                                                                               
>                                                                             
> 0  American crow  Corvus brachyrhynchos  Passeriformes  Corvidae              
>                                                                               
>                                                                             
> 1   Canada goose      Branta canadensis   Anseriformes  Anatidae              
>                                                                               
>                                                                             
> 2        mallard     Anas platyrhynchos   Anseriformes  Anatidae              
>                                                                               
>                                                                             
>                                                                               
>                                                                               
>                                                                             
> In [5]: ib.collect(to_dataframe(rows))                                        
>                                                                               
>                                                                             
> 'Processing...'                                                               
>                                                                               
>                                                                             
> 'Done.'                                                                       
>                                                                               
>                                                                             
> Out[5]:                                                                       
>                                                                               
>                                                                             
>      common_name        scientific_name          order    family              
>                                                                               
>                                                                             
> 0  American crow  Corvus brachyrhynchos  Passeriformes  Corvidae              
>                                                                               
>                                                                             
> 1   Canada goose      Branta canadensis   Anseriformes  Anatidae              
>                                                                               
>                                                                             
> 2        mallard     Anas platyrhynchos   Anseriformes  Anatidae              
>                                                                               
>                                                                             
>                                                                               
>                                                                               
>                                                                             
> In [6]: deferred_df = to_dataframe(rows)                                      
>                                                                               
>                                                                             
>                                                                               
>                                                                               
>                                                                             
> In [7]: ib.collect(deferred_df)                                               
>                                                                               
>                                                                             
> 'Processing...'                                                               
>                                                                               
>                                                                             
> 'Done.'                                                                       
>                                                                               
>                                                                             
> Out[7]:                                                                       
>                                                                               
>                                                                             
> Empty DataFrame                                                               
>                                                                               
>                                                                             
> Columns: []                                                                   
>                                                                               
>                                                                             
> Index: []       
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to