[
https://issues.apache.org/jira/browse/BEAM-11969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kenneth Knowles updated BEAM-11969:
-----------------------------------
Fix Version/s: (was: 2.30.0)
2.29.0
> Make row-group size configurable in ParquetIO.Sink
> --------------------------------------------------
>
> Key: BEAM-11969
> URL: https://issues.apache.org/jira/browse/BEAM-11969
> Project: Beam
> Issue Type: Improvement
> Components: io-java-parquet
> Reporter: Bashir Sadjad
> Assignee: Alexey Romanenko
> Priority: P2
> Labels: easyfix
> Fix For: 2.29.0
>
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> It doesn't seem that ParquetIO.Sink has an option for setting row-group size.
> Its builder has a
> [withConfiguration|https://github.com/apache/beam/blob/fffb85a35df6ae3bdb2934c077856f6b27559aa7/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L1089]
> but it does not seem to change rowGroupSize in
> [ParquetWriter.Builder|https://github.com/apache/parquet-mr/blob/bdf935a43bd377c8052840a4328cf5b7603aa70a/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetWriter.java#L636]
> and hence the default 128MB is used. It should be fairly easy to add the
> plumbing for setting this option
> [here|https://github.com/apache/beam/blob/fffb85a35df6ae3bdb2934c077856f6b27559aa7/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L1112].
--
This message was sent by Atlassian Jira
(v8.3.4#803005)