[GitHub] [beam] damccorm opened a new issue, #21578: Nullable Integer support in with pandas not working as expected

GitBox Sat, 04 Jun 2022 17:26:22 -0700


damccorm opened a new issue, #21578:
URL: https://github.com/apache/beam/issues/21578


   I am reading data from a parquet and one of the columns is a Nullable 
Integer 
([https://pandas.pydata.org/docs/user_guide/integer_na.html#integer-na)](https://pandas.pydata.org/docs/user_guide/integer_na.html#integer-na))
   
   Not 100% sure I correctly declared it:
   
    
   ```
   
   import typing
   from typing import Dict, Iterable, List, Optional
   import apache_beam as beam
   from
   apache_beam.options.pipeline_options import PipelineOptions
   
   class Record(typing.NamedTuple):
      
   port: Optional[int]
       #port: str
   recFields=set([i for i in Record.__dict__.keys() if i[:1] != '_'])
   beam.coders.registry.register_coder(Record,beam.coders.RowCoder)
   def
   extractDF(tuple):
     df=tuple[1].to_pandas()
     print(type(df.port.dtype))
     return df
   input_patterns
   = ['data/*.parquet']
   #local runner
   options = PipelineOptions(flags=[], type_check_additional='all')
    
   def
   toRecords(df):
       #df["port"]=None
       return df.to_dict('records')
   
   with beam.Pipeline(options=options)
   as pipeline:
         lines = (pipeline | 'Create file patterns' >> 
beam.Create(input_patterns)
        
   | 'Read Parquet files' >>  
beam.io.ReadAllFromParquetBatched(columns=recFields,with_filename=True)
    
       | 'Extract DF' >> beam.Map(extractDF )
         | 'To dictionaries' >> beam.FlatMap(toRecords)
    
       |  'ToRows' >> beam.Map(lambda x: Record(**x)).with_output_types(Record)
         | "print">> beam.Map(print))
   ```
   
   
   This fails with an type error. 
   When I uncomment the line in toRecords to set everything to None it works 
fine. 
   
   
   
   Imported from Jira 
[BEAM-14228](https://issues.apache.org/jira/browse/BEAM-14228). Original Jira 
may contain additional context.
   Reported by: kohlerm.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] damccorm opened a new issue, #21578: Nullable Integer support in with pandas not working as expected

Reply via email to