[
https://issues.apache.org/jira/browse/AVRO-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573746#comment-13573746
]
Ted Malaska commented on AVRO-1243:
-----------------------------------
OK I think I have it. The reader gets the Codec through
DataFileStream.resolveCodec and that has access to all the meta data.
I think I have everything I need to implement a patch.
My first attempt will use the following parameters to read and write with a
Hadoop codec when not running map/reduce
avro.codec=reflectionCodec
avro.reflection.codec.class=org.apache.avro.hadoop.file.HadoopCodec
mapred.output.compression=true
mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec
When running map/reduce and going through the AvroOutputFormat only the
following parameters will be needed:
mapred.output.compression=true
mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec
So with this we get two things:
1. The normal Avro reader and writer can read files produced by AvroOutputFormat
2. AvroOutputFormat will behave the same of RCFiles and Sequence Files when it
comes to compression
Let me know what you think. I have to finish some work first then I will try
to get this done through the weekend.
> Support all compression codecs
> ------------------------------
>
> Key: AVRO-1243
> URL: https://issues.apache.org/jira/browse/AVRO-1243
> Project: Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.7.3
> Reporter: Ted Malaska
> Priority: Minor
>
> I may be reading this wrong but at this time
> org.apache.avro.file.CodecFactory only supports null, deflate, and snappy
> compression codecs.
> I would like to change the fromString method to use
> Class.forName(codec).newInstance(); after the codec was not found in the
> REGISTERED map but before the AvroRuntimeException is thrown.
> Here are some of my supporting thoughts
> 1. This should not interduce much slowness because it will only be called
> initialize.
> 2. This will allow for support for GZip, BZip2, and LZO with out adding more
> dependances to the maven pom file.
> 3. This will allow for a future Jiri I would like to do that would allow
> AvroOutputFormat to be able to use the following configs:
> mapred.output.compress and mapred.output.compression.codec
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira