damccorm opened a new issue, #21171: URL: https://github.com/apache/beam/issues/21171
Just as we can infer a Beam Schema from a NamedTuple type ([code](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/schemas.py)), we should have support for inferring a schema from a [protobuf-generated Python type](https://developers.google.com/protocol-buffers/docs/pythontutorial). This should integrate well with the rest of the schema infrastructure. For example it should be possible to use schema-aware transforms like [SqlTransform](https://beam.apache.org/releases/pydoc/2.32.0/apache_beam.transforms.sql.html#apache_beam.transforms.sql.SqlTransform), [Select](https://beam.apache.org/releases/pydoc/2.32.0/apache_beam.transforms.core.html#apache_beam.transforms.core.Select), or [beam.dataframe.convert.to_dataframe](https://beam.apache.org/releases/pydoc/2.32.0/apache_beam.dataframe.convert.html#apache_beam.dataframe.convert.to_dataframe) on a PCollection that is annotated with a protobuf type. For example (using the addressbook_pb2 example from the [tutorial](https://developers.google.com/protocol-buffers/docs/pythontutorial#reading-a-message)): ``` import adressbook_pb2 import apache_beam as beam from apache_beam.dataframe.convert import to_dataframe pc = (input_pc | beam.Map(create_person).with_output_type(addressbook_pb2.Person)) df = to_dataframe(pc) # deferred dataframe with fields id, name, email, ... # OR pc | beam.transforms.SqlTransform("SELECT name WHERE email = '[email protected]' FROM PCOLLECTION") ``` Imported from Jira [BEAM-12955](https://issues.apache.org/jira/browse/BEAM-12955). Original Jira may contain additional context. Reported by: bhulette. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
