David Arthur created AVRO-1339:
----------------------------------
Summary: AvroSequenceFile is always uncompressed
Key: AVRO-1339
URL: https://issues.apache.org/jira/browse/AVRO-1339
Project: Avro
Issue Type: Bug
Reporter: David Arthur
It appears that AvroSequenceFile is not passing compression type/codec info
down to the SequenceFile.Writer. This is because AvroSequenceFile.Writer is
making a direct call to SequenceFile.Writer's public constructor rather than
using one of the SequenceFile createWriter factory methods
https://github.com/apache/avro/blob/trunk/lang/java/mapred/src/main/java/org/apache/avro/hadoop/io/AvroSequenceFile.java#L532
Here is a bit of workaround code that I came up with
{code:java}
AvroSequenceFile.Writer.Options options = new AvroSequenceFile.Writer.Options()
.withConfiguration(hdfsInfo.getConf())
.withFileSystem(hdfsInfo.getFileSystem())
.withOutputPath(hdfsInfo.getPath())
.withCompressionType(configuration.getCompressionType())
.withCompressionCodec(configuration.getCompressionCodec().getCodec())
.withProgressable(new Progressable() {
@Override
public void progress(){
}
})
.withKeySchema(configuration.getKeySchema())
.withValueSchema(configuration.getValueSchema());
// Have to do this here b/c it's hidden in a private method :(
Metadata metadata = options.getMetadata();
if (null != configuration.getKeySchema()) {
metadata.set(AvroSequenceFile.METADATA_FIELD_KEY_SCHEMA, new
Text(configuration.getKeySchema().toString()));
}
if (null != configuration.getValueSchema()) {
metadata.set(AvroSequenceFile.METADATA_FIELD_VALUE_SCHEMA, new
Text(configuration.getValueSchema().toString()));
}
return SequenceFile.createWriter(
options.getFileSystem(),
options.getConfigurationWithAvroSerialization(),
options.getOutputPath(),
options.getKeyClass(),
options.getValueClass(),
options.getBufferSizeBytes(),
options.getReplicationFactor(),
options.getBlockSizeBytes(),
options.getCompressionType(),
options.getCompressionCodec(),
options.getProgressable(),
metadata);
{code}
I used this code to write a BZIP2 block compressed sequence file, and was able
to read it using the Avro mapreduce classes just fine.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira