[jira] [Commented] (BEAM-14508) Parquetio should produce a schema'd PCollection

Brian Hulette (Jira) Tue, 24 May 2022 17:25:04 -0700


    [ 
https://issues.apache.org/jira/browse/BEAM-14508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541760#comment-17541760
 ]


Brian Hulette commented on BEAM-14508:
--------------------------------------

Agreed! I think the ideal solution here would be to allow inferring schemas 
from a TypedDict typehint, then the execution time code doesn't have to change, 
we'd just need to add a typehint. Unfortunately we don't yet support schema 
inference from a TypedDict.

Until then, we could add an option that changes the output type to NamedTuple, 
similar to what Svetak is doing for BigQuery.

Finally - for any user that lands here I want to point out that there is a 
workaround. You could use the DataFrame API 
[read_parquet|https://beam.apache.org/releases/pydoc/2.38.0/apache_beam.dataframe.io.html#apache_beam.dataframe.io.read_parquet]
 method, then call to_pcollection on the result.

> Parquetio should produce a schema'd PCollection
> -----------------------------------------------
>
>                 Key: BEAM-14508
>                 URL: https://issues.apache.org/jira/browse/BEAM-14508
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Robert Bradshaw
>            Assignee: Brian Hulette
>            Priority: P2
>
> Or at least have an option to do so.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (BEAM-14508) Parquetio should produce a schema'd PCollection

Reply via email to