[
https://issues.apache.org/jira/browse/BEAM-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175703#comment-17175703
]
Jiadai Xia commented on BEAM-4379:
----------------------------------
Currently I have done the implementation of Splittable Dofn in ParquetIo and we
are able to initially split the file into blocks of around 64MB and further
split if necessary. The split is block based so the minimal split that we can
have is down to one row group. So we can choose to read with split by adding
.withSplit() after the read files. I have also tested the code in Dataflow
runner and Direct Runner. The code is now ready for review. [~aromanenko]
> Make ParquetIO Read splittable
> ------------------------------
>
> Key: BEAM-4379
> URL: https://issues.apache.org/jira/browse/BEAM-4379
> Project: Beam
> Issue Type: Improvement
> Components: io-ideas, io-java-parquet
> Reporter: Lukasz Gajowy
> Assignee: Jiadai Xia
> Priority: P2
>
> As the title stands - currently it is not splittable which is not optimal for
> runners that support splitting.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)