Scott Carey wrote:
On the specific needs for compression options, I would rather have
avro.codec.options as a general purpose container for codec options than
avro.codec.compression_level. Some codecs have compression levels like gzip, 0 to 9.
Others have a set of flags or multiple dimensions of options. Each codec can do what it
will with avro.codec.options. Deflate can have "level=[0-9]" for values.
Additionally, the Codec API can incorporate a
public String getOptions();
public void SetOptions(String options);
interface so that file appends can pick up the options that the file was
created with.
Strictly speaking, we don't need to include options in the file, since
they don't affect the format. They could even be misleading, since one
might use different compression levels in different append operations,
and I don't see any strong reason to prohibit that.
A given application could always store its options and re-use them when
appending, e.g., my.gzip.level=5. If they're included in the spec then
would we then prohibit one to override them? If not, what would be the
purpose of putting them in the spec?
Also, rather than packing all options into a single string that must be
parsed, we might instead reserve avro.codec.<codecName>.* for
codec-specific options. So one might specify avro.codec.deflate.level
as 5. The codec name is actually redundant, since only a single codec
name is permitted per file. So this could just instead perhaps be
avro.codec.level without much fear of confusion.
Doug