[
https://issues.apache.org/jira/browse/AVRO-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Lynch updated AVRO-4262:
---------------------------------
Description:
avro-tools recodec should explicitly set the default codec compression level
for the gzip, xz and zstd codecs when the level parameter is not provided. The
level defaults to -1 for all 3 of these codecs, which is fine for gzip but is
an invalid level for xz and 3 is the default for zstd - so when you expect the
default level 3 compression but end up with -1 level and wondering why
compression is so poor.
There are default level constants per codec already defined in the sources.
Evidence of before and after
{code:java}
java -jar avro-tools-1.12.1.jar recodec --codec xz input.avro
output_conv_xz.avro
Exception in thread "main" org.tukaani.xz.UnsupportedOptionsException:
Unsupported preset: -1
at org.tukaani.xz.LZMA2Options.setPreset(LZMA2Options.java:196)
at org.tukaani.xz.LZMA2Options.<init>(LZMA2Options.java:157)
at
org.apache.commons.compress.compressors.xz.XZCompressorOutputStream.<init>(XZCompressorOutputStream.java:148)
at org.apache.avro.file.XZCodec.compress(XZCodec.java:62)
at
org.apache.avro.file.DataFileStream$DataBlock.compressUsing(DataFileStream.java:384)
at org.apache.avro.file.DataFileWriter.appendAllFrom(DataFileWriter.java:403)
at org.apache.avro.tool.RecodecTool.run(RecodecTool.java:80)
at org.apache.avro.tool.Main.run(Main.java:67)
at org.apache.avro.tool.Main.main(Main.java:56)
Working with patch
java -jar avro-tools-1.13.0-SNAPSHOT.jar recodec --codec xz input.avro
output_conv_xz.avro
java -jar avro-tools-1.13.0-SNAPSHOT.jar getmeta output_conv_xz.avro | grep
codec
avro.codec xz
{code}
was:
avro-tools recodec should explicitly set the default codec compression level
for the gzip, xz and zstd codecs when the level parameter is not provided. The
level defaults to -1 which for all 3 of these codecs, which is fine for gzip
but is an invalid level for xz and 3 is the default for zstd - so when you
expect the default level 3 compression but end up with -1 level and wondering
why compression is so poor.
There are default level constants per codec already defined in the sources.
Evidence of before and after
{code:java}
java -jar avro-tools-1.12.1.jar recodec --codec xz input.avro
output_conv_xz.avro
Exception in thread "main" org.tukaani.xz.UnsupportedOptionsException:
Unsupported preset: -1
at org.tukaani.xz.LZMA2Options.setPreset(LZMA2Options.java:196)
at org.tukaani.xz.LZMA2Options.<init>(LZMA2Options.java:157)
at
org.apache.commons.compress.compressors.xz.XZCompressorOutputStream.<init>(XZCompressorOutputStream.java:148)
at org.apache.avro.file.XZCodec.compress(XZCodec.java:62)
at
org.apache.avro.file.DataFileStream$DataBlock.compressUsing(DataFileStream.java:384)
at org.apache.avro.file.DataFileWriter.appendAllFrom(DataFileWriter.java:403)
at org.apache.avro.tool.RecodecTool.run(RecodecTool.java:80)
at org.apache.avro.tool.Main.run(Main.java:67)
at org.apache.avro.tool.Main.main(Main.java:56)
Working with patch
java -jar avro-tools-1.13.0-SNAPSHOT.jar recodec --codec xz input.avro
output_conv_xz.avro
java -jar avro-tools-1.13.0-SNAPSHOT.jar getmeta output_conv_xz.avro | grep
codec
avro.codec xz
{code}
> avro-tools recodec defaults to level -1 which is an incorrect level for xz
> and not default for zstd
> ---------------------------------------------------------------------------------------------------
>
> Key: AVRO-4262
> URL: https://issues.apache.org/jira/browse/AVRO-4262
> Project: Apache Avro
> Issue Type: Bug
> Components: tools
> Affects Versions: 1.12.1
> Reporter: Jonathan Lynch
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.13.0
>
> Attachments:
> 0001-use-default-codec-compression-level-for-gzip-xz-and-.patch
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> avro-tools recodec should explicitly set the default codec compression level
> for the gzip, xz and zstd codecs when the level parameter is not provided.
> The level defaults to -1 for all 3 of these codecs, which is fine for gzip
> but is an invalid level for xz and 3 is the default for zstd - so when you
> expect the default level 3 compression but end up with -1 level and
> wondering why compression is so poor.
> There are default level constants per codec already defined in the sources.
> Evidence of before and after
> {code:java}
> java -jar avro-tools-1.12.1.jar recodec --codec xz input.avro
> output_conv_xz.avro
> Exception in thread "main" org.tukaani.xz.UnsupportedOptionsException:
> Unsupported preset: -1
> at org.tukaani.xz.LZMA2Options.setPreset(LZMA2Options.java:196)
> at org.tukaani.xz.LZMA2Options.<init>(LZMA2Options.java:157)
> at
> org.apache.commons.compress.compressors.xz.XZCompressorOutputStream.<init>(XZCompressorOutputStream.java:148)
> at org.apache.avro.file.XZCodec.compress(XZCodec.java:62)
> at
> org.apache.avro.file.DataFileStream$DataBlock.compressUsing(DataFileStream.java:384)
> at org.apache.avro.file.DataFileWriter.appendAllFrom(DataFileWriter.java:403)
> at org.apache.avro.tool.RecodecTool.run(RecodecTool.java:80)
> at org.apache.avro.tool.Main.run(Main.java:67)
> at org.apache.avro.tool.Main.main(Main.java:56)
>
> Working with patch
> java -jar avro-tools-1.13.0-SNAPSHOT.jar recodec --codec xz input.avro
> output_conv_xz.avro
> java -jar avro-tools-1.13.0-SNAPSHOT.jar getmeta output_conv_xz.avro | grep
> codec
> avro.codec xz
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)