Brian Hulette created BEAM-12955:
------------------------------------
Summary: Add support for inferring Beam Schemas from Python
protobuf types
Key: BEAM-12955
URL: https://issues.apache.org/jira/browse/BEAM-12955
Project: Beam
Issue Type: Improvement
Components: sdk-py-core
Reporter: Brian Hulette
Just as we can infer a Beam Schema from a NamedTuple type
([code|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/schemas.py]),
we should have support for inferring a schema from a [protobuf-generated
Python type|https://developers.google.com/protocol-buffers/docs/pythontutorial].
This should integrate well with the rest of the schema infrastructure. For
example it should be possible to use schema-aware transforms like SqlTransform,
Select, or beam.dataframe.convert.to_dataframe on a PCollection that is
annotated with a protobuf type. For example (using the addressbook_pb2 example
from the
[tutorial|https://developers.google.com/protocol-buffers/docs/pythontutorial#reading-a-message]):
{code:python}
import adressbook_pb2
import apache_beam as beam
from apache_beam.dataframe.convert import to_dataframe
pc = (input_pc |
beam.Map(create_person).with_output_type(addressbook_pb2.Person))
df = to_dataframe(pc) # deferred dataframe with fields id, name, email, ...
# OR
pc | beam.transforms.SqlTransform("SELECT name WHERE email = '[email protected]'
FROM PCOLLECTION")
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)