Hi,

I am executing a compaction job on snappy compressed Avro files. Though the job 
is executing successfully the output is not compressed. Following is my 
configuration for compaction job –

fs.uri=hdfs://hdp-ubuntu-hadoop-mgr-1:8020
writer.fs.uri=${fs.uri}

job.name=CompactKafkaMR
job.group=PNDA

mr.job.max.mappers=5

compaction.datasets.finder=gobblin.compaction.dataset.TimeBasedSubDirDatasetsFinder
compaction.input.dir=/user/pnda/PNDA_datasets/datasets
compaction.dest.dir=/user/pnda/PNDA_datasets/compacted8
compaction.input.subdir=.
compaction.dest.subdir=.
compaction.timebased.folder.pattern='year='YYYY/'month='MM/'day='dd/'hour='HH
compaction.timebased.max.time.ago=10d
compaction.timebased.min.time.ago=1h
compaction.input.deduplicated=true
compaction.output.deduplicated=true
compaction.jobprops.creator.class=gobblin.compaction.mapreduce.MRCompactorTimeBasedJobPropCreator
compaction.job.runner.class=gobblin.compaction.mapreduce.avro.MRCompactorAvroKeyDedupJobRunner
compaction.timezone=UTC
compaction.job.overwrite.output.dir=true
compaction.recompact.from.input.for.late.data=true


I tried these options to no success–


mapreduce.output.fileoutputformat.compress=true

mapreduce.output.fileoutputformat.compress.codec=hadoop.io.compress.SnappyCodec

mapreduce.output.fileoutputformat.compress.type=RECORD

writer.output.format=AVRO



writer.codec.type=SNAPPY

writer.builder.class=gobblin.writer.AvroDataWriterBuilder


Kindly let know how to proceed on this. Am I missing some configuration 
parameters?

Thanks
Sushant Pandey

Reply via email to