damccorm opened a new issue, #21578: URL: https://github.com/apache/beam/issues/21578
I am reading data from a parquet and one of the columns is a Nullable Integer ([https://pandas.pydata.org/docs/user_guide/integer_na.html#integer-na)](https://pandas.pydata.org/docs/user_guide/integer_na.html#integer-na)) Not 100% sure I correctly declared it: ``` import typing from typing import Dict, Iterable, List, Optional import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions class Record(typing.NamedTuple): port: Optional[int] #port: str recFields=set([i for i in Record.__dict__.keys() if i[:1] != '_']) beam.coders.registry.register_coder(Record,beam.coders.RowCoder) def extractDF(tuple): df=tuple[1].to_pandas() print(type(df.port.dtype)) return df input_patterns = ['data/*.parquet'] #local runner options = PipelineOptions(flags=[], type_check_additional='all') def toRecords(df): #df["port"]=None return df.to_dict('records') with beam.Pipeline(options=options) as pipeline: lines = (pipeline | 'Create file patterns' >> beam.Create(input_patterns) | 'Read Parquet files' >> beam.io.ReadAllFromParquetBatched(columns=recFields,with_filename=True) | 'Extract DF' >> beam.Map(extractDF ) | 'To dictionaries' >> beam.FlatMap(toRecords) | 'ToRows' >> beam.Map(lambda x: Record(**x)).with_output_types(Record) | "print">> beam.Map(print)) ``` This fails with an type error. When I uncomment the line in toRecords to set everything to None it works fine. Imported from Jira [BEAM-14228](https://issues.apache.org/jira/browse/BEAM-14228). Original Jira may contain additional context. Reported by: kohlerm. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
