[ 
https://issues.apache.org/jira/browse/PARQUET-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137967#comment-17137967
 ] 

Xinli Shang commented on PARQUET-1872:
--------------------------------------

[~gszadovszky]Thanks for the reply! I just manually linked the PR. 

For the subtask, I was thinking to have a review & changes first with 
parquet-tools then I can add it to parquet-cli instead of changing both at the 
same time. But that is also fine for me to have the two places changes at the 
same PR. I just add to parquet-cli in the newest PR. 

For Column and OffsetIndex, they are taken care of in my PR. I also added tests 
for both ColumnIndex and OffsetIndex validation. 

For bloom filter, I will work on the subtask when this PR is done. That would 
require to copy over the existing bloom filters to the new files.

Xinli 

> Add TransCompression command 
> -----------------------------
>
>                 Key: PARQUET-1872
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1872
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>    Affects Versions: 1.12.0
>            Reporter: Xinli Shang
>            Assignee: Xinli Shang
>            Priority: Major
>
> When ZSTD becomes more popular, there is a need to translate existing data to 
> ZSTD compressed which can achieve a higher compression ratio. It would be 
> useful if we can have a tool to convert a Parquet file directly by just 
> decompressing/compressing each page without decoding/encoding or assembling 
> the record because it is much faster. The initial result shows it is ~5 times 
> faster. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to