[
https://issues.apache.org/jira/browse/BEAM-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902108#comment-16902108
]
Ryan Skraba commented on BEAM-4379:
-----------------------------------
I'm looking at what Spark has done for splittable Parquet files -- it looks
like there's a lot of reusable strategy that might be pushed up to Parquet,
especially with their own ReadSupport for catalyst data types (the equivalent
of BEAM-4812, reading and writing Rows directly from Parquet).
I'm still ramping up on the necessary changes to Parquet, but I won't be
offended if my conclusion is proven wrong or someone with more expertise takes
the JIRA, of course!
> Make ParquetIO Read splittable
> ------------------------------
>
> Key: BEAM-4379
> URL: https://issues.apache.org/jira/browse/BEAM-4379
> Project: Beam
> Issue Type: Improvement
> Components: io-ideas, io-java-parquet
> Reporter: Lukasz Gajowy
> Priority: Major
>
> As the title stands - currently it is not splittable which is not optimal for
> runners that support splitting.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)