[ https://issues.apache.org/jira/browse/PARQUET-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137967#comment-17137967 ]
Xinli Shang commented on PARQUET-1872: -------------------------------------- [~gszadovszky]Thanks for the reply! I just manually linked the PR. For the subtask, I was thinking to have a review & changes first with parquet-tools then I can add it to parquet-cli instead of changing both at the same time. But that is also fine for me to have the two places changes at the same PR. I just add to parquet-cli in the newest PR. For Column and OffsetIndex, they are taken care of in my PR. I also added tests for both ColumnIndex and OffsetIndex validation. For bloom filter, I will work on the subtask when this PR is done. That would require to copy over the existing bloom filters to the new files. Xinli > Add TransCompression command > ----------------------------- > > Key: PARQUET-1872 > URL: https://issues.apache.org/jira/browse/PARQUET-1872 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr > Affects Versions: 1.12.0 > Reporter: Xinli Shang > Assignee: Xinli Shang > Priority: Major > > When ZSTD becomes more popular, there is a need to translate existing data to > ZSTD compressed which can achieve a higher compression ratio. It would be > useful if we can have a tool to convert a Parquet file directly by just > decompressing/compressing each page without decoding/encoding or assembling > the record because it is much faster. The initial result shows it is ~5 times > faster. -- This message was sent by Atlassian Jira (v8.3.4#803005)