Scott Carey wrote:
On the specific needs for compression options, I would rather have 
avro.codec.options as a general purpose container for codec options than
avro.codec.compression_level.   Some codecs have compression levels like gzip, 0 to 9.  
Others have a set of flags or multiple dimensions of options.  Each codec can do what it 
will with avro.codec.options.   Deflate can have "level=[0-9]" for values.
Additionally, the Codec API can incorporate a
public String getOptions();
public void SetOptions(String options);
interface so that file appends can pick up the options that the file was 
created with.

Strictly speaking, we don't need to include options in the file, since they don't affect the format. They could even be misleading, since one might use different compression levels in different append operations, and I don't see any strong reason to prohibit that.

A given application could always store its options and re-use them when appending, e.g., my.gzip.level=5. If they're included in the spec then would we then prohibit one to override them? If not, what would be the purpose of putting them in the spec?

Also, rather than packing all options into a single string that must be parsed, we might instead reserve avro.codec.<codecName>.* for codec-specific options. So one might specify avro.codec.deflate.level as 5. The codec name is actually redundant, since only a single codec name is permitted per file. So this could just instead perhaps be avro.codec.level without much fear of confusion.

Doug

Reply via email to