[
https://issues.apache.org/jira/browse/BEAM-14508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541760#comment-17541760
]
Brian Hulette commented on BEAM-14508:
--------------------------------------
Agreed! I think the ideal solution here would be to allow inferring schemas
from a TypedDict typehint, then the execution time code doesn't have to change,
we'd just need to add a typehint. Unfortunately we don't yet support schema
inference from a TypedDict.
Until then, we could add an option that changes the output type to NamedTuple,
similar to what Svetak is doing for BigQuery.
Finally - for any user that lands here I want to point out that there is a
workaround. You could use the DataFrame API
[read_parquet|https://beam.apache.org/releases/pydoc/2.38.0/apache_beam.dataframe.io.html#apache_beam.dataframe.io.read_parquet]
method, then call to_pcollection on the result.
> Parquetio should produce a schema'd PCollection
> -----------------------------------------------
>
> Key: BEAM-14508
> URL: https://issues.apache.org/jira/browse/BEAM-14508
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Robert Bradshaw
> Assignee: Brian Hulette
> Priority: P2
>
> Or at least have an option to do so.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)