damccorm opened a new issue, #21171:
URL: https://github.com/apache/beam/issues/21171

   Just as we can infer a Beam Schema from a NamedTuple type 
([code](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/schemas.py)),
 we should have support for inferring a schema from a [protobuf-generated 
Python 
type](https://developers.google.com/protocol-buffers/docs/pythontutorial).
   
   This should integrate well with the rest of the schema infrastructure. For 
example it should be possible to use schema-aware transforms like 
[SqlTransform](https://beam.apache.org/releases/pydoc/2.32.0/apache_beam.transforms.sql.html#apache_beam.transforms.sql.SqlTransform),
 
[Select](https://beam.apache.org/releases/pydoc/2.32.0/apache_beam.transforms.core.html#apache_beam.transforms.core.Select),
 or 
[beam.dataframe.convert.to_dataframe](https://beam.apache.org/releases/pydoc/2.32.0/apache_beam.dataframe.convert.html#apache_beam.dataframe.convert.to_dataframe)
 on a PCollection that is annotated with a protobuf type. For example (using 
the addressbook_pb2 example from the 
[tutorial](https://developers.google.com/protocol-buffers/docs/pythontutorial#reading-a-message)):
   
   ```
   
   import adressbook_pb2
   
   import apache_beam as beam
   from apache_beam.dataframe.convert import to_dataframe
   
   pc
   = (input_pc | 
beam.Map(create_person).with_output_type(addressbook_pb2.Person))
   
   df = to_dataframe(pc)
   # deferred dataframe with fields id, name, email, ...
   
   # OR
   
   pc | beam.transforms.SqlTransform("SELECT
   name WHERE email = '[email protected]' FROM PCOLLECTION")
   
   ```
   
   
   Imported from Jira 
[BEAM-12955](https://issues.apache.org/jira/browse/BEAM-12955). Original Jira 
may contain additional context.
   Reported by: bhulette.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to