[ 
https://issues.apache.org/jira/browse/BEAM-11969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-11969:
------------------------------------
    Status: Open  (was: Triage Needed)

> Make row-group size configurable in ParquetIO.Sink
> --------------------------------------------------
>
>                 Key: BEAM-11969
>                 URL: https://issues.apache.org/jira/browse/BEAM-11969
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-parquet
>            Reporter: Bashir Sadjad
>            Priority: P2
>              Labels: easyfix
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> It doesn't seem that ParquetIO.Sink has an option for setting row-group size. 
> Its builder has a 
> [withConfiguration|https://github.com/apache/beam/blob/fffb85a35df6ae3bdb2934c077856f6b27559aa7/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L1089]
>  but it does not seem to change rowGroupSize in 
> [ParquetWriter.Builder|https://github.com/apache/parquet-mr/blob/bdf935a43bd377c8052840a4328cf5b7603aa70a/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetWriter.java#L636]
>  and hence the default 128MB is used. It should be fairly easy to add the 
> plumbing for setting this option 
> [here|https://github.com/apache/beam/blob/fffb85a35df6ae3bdb2934c077856f6b27559aa7/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L1112].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to