[ 
https://issues.apache.org/jira/browse/AVRO-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lynch updated AVRO-4262:
---------------------------------
    Description: 
avro-tools recodec should explicitly set the default codec compression level 
for the gzip, xz and zstd codecs when the level parameter is not provided. The 
level defaults to -1 for all 3 of these codecs, which is fine for gzip but is 
an invalid level for xz and 3 is the default for zstd - so when you expect the 
default level 3 compression but end up with -1  level and wondering why 
compression is so poor.

There are default level constants per codec already defined in the sources.

Evidence of before and after
{code:java}
java -jar avro-tools-1.12.1.jar recodec --codec xz input.avro 
output_conv_xz.avro
Exception in thread "main" org.tukaani.xz.UnsupportedOptionsException: 
Unsupported preset: -1
at org.tukaani.xz.LZMA2Options.setPreset(LZMA2Options.java:196)
at org.tukaani.xz.LZMA2Options.<init>(LZMA2Options.java:157)
at 
org.apache.commons.compress.compressors.xz.XZCompressorOutputStream.<init>(XZCompressorOutputStream.java:148)
at org.apache.avro.file.XZCodec.compress(XZCodec.java:62)
at 
org.apache.avro.file.DataFileStream$DataBlock.compressUsing(DataFileStream.java:384)
at org.apache.avro.file.DataFileWriter.appendAllFrom(DataFileWriter.java:403)
at org.apache.avro.tool.RecodecTool.run(RecodecTool.java:80)
at org.apache.avro.tool.Main.run(Main.java:67)
at org.apache.avro.tool.Main.main(Main.java:56)
 
Working with patch
java -jar avro-tools-1.13.0-SNAPSHOT.jar recodec --codec xz input.avro 
output_conv_xz.avro

java -jar avro-tools-1.13.0-SNAPSHOT.jar getmeta output_conv_xz.avro | grep 
codec
avro.codec    xz
{code}

  was:
avro-tools recodec should explicitly set the default codec compression level 
for the gzip, xz and zstd codecs when the level parameter is not provided. The 
level defaults to -1 which for all 3 of these codecs, which is fine for gzip 
but is an invalid level for xz and 3 is the default for zstd - so when you 
expect the default level 3 compression but end up with -1  level and wondering 
why compression is so poor.

There are default level constants per codec already defined in the sources.

Evidence of before and after
{code:java}
java -jar avro-tools-1.12.1.jar recodec --codec xz input.avro 
output_conv_xz.avro
Exception in thread "main" org.tukaani.xz.UnsupportedOptionsException: 
Unsupported preset: -1
at org.tukaani.xz.LZMA2Options.setPreset(LZMA2Options.java:196)
at org.tukaani.xz.LZMA2Options.<init>(LZMA2Options.java:157)
at 
org.apache.commons.compress.compressors.xz.XZCompressorOutputStream.<init>(XZCompressorOutputStream.java:148)
at org.apache.avro.file.XZCodec.compress(XZCodec.java:62)
at 
org.apache.avro.file.DataFileStream$DataBlock.compressUsing(DataFileStream.java:384)
at org.apache.avro.file.DataFileWriter.appendAllFrom(DataFileWriter.java:403)
at org.apache.avro.tool.RecodecTool.run(RecodecTool.java:80)
at org.apache.avro.tool.Main.run(Main.java:67)
at org.apache.avro.tool.Main.main(Main.java:56)
 
Working with patch
java -jar avro-tools-1.13.0-SNAPSHOT.jar recodec --codec xz input.avro 
output_conv_xz.avro

java -jar avro-tools-1.13.0-SNAPSHOT.jar getmeta output_conv_xz.avro | grep 
codec
avro.codec    xz
{code}


> avro-tools recodec defaults to level -1 which is an incorrect level for xz 
> and not default for zstd
> ---------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-4262
>                 URL: https://issues.apache.org/jira/browse/AVRO-4262
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: tools
>    Affects Versions: 1.12.1
>            Reporter: Jonathan Lynch
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.13.0
>
>         Attachments: 
> 0001-use-default-codec-compression-level-for-gzip-xz-and-.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> avro-tools recodec should explicitly set the default codec compression level 
> for the gzip, xz and zstd codecs when the level parameter is not provided. 
> The level defaults to -1 for all 3 of these codecs, which is fine for gzip 
> but is an invalid level for xz and 3 is the default for zstd - so when you 
> expect the default level 3 compression but end up with -1  level and 
> wondering why compression is so poor.
> There are default level constants per codec already defined in the sources.
> Evidence of before and after
> {code:java}
> java -jar avro-tools-1.12.1.jar recodec --codec xz input.avro 
> output_conv_xz.avro
> Exception in thread "main" org.tukaani.xz.UnsupportedOptionsException: 
> Unsupported preset: -1
> at org.tukaani.xz.LZMA2Options.setPreset(LZMA2Options.java:196)
> at org.tukaani.xz.LZMA2Options.<init>(LZMA2Options.java:157)
> at 
> org.apache.commons.compress.compressors.xz.XZCompressorOutputStream.<init>(XZCompressorOutputStream.java:148)
> at org.apache.avro.file.XZCodec.compress(XZCodec.java:62)
> at 
> org.apache.avro.file.DataFileStream$DataBlock.compressUsing(DataFileStream.java:384)
> at org.apache.avro.file.DataFileWriter.appendAllFrom(DataFileWriter.java:403)
> at org.apache.avro.tool.RecodecTool.run(RecodecTool.java:80)
> at org.apache.avro.tool.Main.run(Main.java:67)
> at org.apache.avro.tool.Main.main(Main.java:56)
>  
> Working with patch
> java -jar avro-tools-1.13.0-SNAPSHOT.jar recodec --codec xz input.avro 
> output_conv_xz.avro
> java -jar avro-tools-1.13.0-SNAPSHOT.jar getmeta output_conv_xz.avro | grep 
> codec
> avro.codec    xz
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to