[ 
https://issues.apache.org/jira/browse/PARQUET-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030750#comment-16030750
 ] 

Uwe L. Korn commented on PARQUET-1011:
--------------------------------------

We can add {{bzip2}} to Parquet but this will only change compression, it won't 
have any effect on splittability. By the design of the format Parquet files are 
always splittable, independently of the compression algorithm used. This means 
especially that also GZIP compressed Parquet files are splittable. In your 
case, it is probably easier to stick with that instead of implementing 
{{bzip2}} in Parquet.

Still it would be nice to see if {{bzip2}} would improve performance-wise 
against the currently implemented GZIP/snappy/Brotli codecs.

> bzip2 compression 
> ------------------
>
>                 Key: PARQUET-1011
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1011
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: Rajasekhar Konda
>
> Hi,
> I have a requirement to implement Parquet with bzip2 compression because it's 
> splitable. Right now, we can't provide bzip2 in PIG. 
> SET parquet.compression none/gzip/SNAPPY; 
> Is there any way to compress to bzip2 on top parquet ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to