Brian Hulette created BEAM-12955:
------------------------------------

             Summary: Add support for inferring Beam Schemas from Python 
protobuf types
                 Key: BEAM-12955
                 URL: https://issues.apache.org/jira/browse/BEAM-12955
             Project: Beam
          Issue Type: Improvement
          Components: sdk-py-core
            Reporter: Brian Hulette


Just as we can infer a Beam Schema from a NamedTuple type 
([code|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/schemas.py]),
 we should have support for inferring a schema from a [protobuf-generated 
Python type|https://developers.google.com/protocol-buffers/docs/pythontutorial].

This should integrate well with the rest of the schema infrastructure. For 
example it should be possible to use schema-aware transforms like SqlTransform, 
Select, or beam.dataframe.convert.to_dataframe on a PCollection that is 
annotated with a protobuf type. For example (using the addressbook_pb2 example 
from the 
[tutorial|https://developers.google.com/protocol-buffers/docs/pythontutorial#reading-a-message]):

{code:python}
import adressbook_pb2

import apache_beam as beam
from apache_beam.dataframe.convert import to_dataframe

pc = (input_pc | 
beam.Map(create_person).with_output_type(addressbook_pb2.Person))

df = to_dataframe(pc) # deferred dataframe with fields id, name, email, ...

# OR

pc | beam.transforms.SqlTransform("SELECT name WHERE email = '[email protected]' 
FROM PCOLLECTION")
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to