Hi,
I am executing a compaction job on snappy compressed Avro files. Though the job
is executing successfully the output is not compressed. Following is my
configuration for compaction job –
fs.uri=hdfs://hdp-ubuntu-hadoop-mgr-1:8020
writer.fs.uri=${fs.uri}
job.name=CompactKafkaMR
job.group=PNDA
mr.job.max.mappers=5
compaction.datasets.finder=gobblin.compaction.dataset.TimeBasedSubDirDatasetsFinder
compaction.input.dir=/user/pnda/PNDA_datasets/datasets
compaction.dest.dir=/user/pnda/PNDA_datasets/compacted8
compaction.input.subdir=.
compaction.dest.subdir=.
compaction.timebased.folder.pattern='year='YYYY/'month='MM/'day='dd/'hour='HH
compaction.timebased.max.time.ago=10d
compaction.timebased.min.time.ago=1h
compaction.input.deduplicated=true
compaction.output.deduplicated=true
compaction.jobprops.creator.class=gobblin.compaction.mapreduce.MRCompactorTimeBasedJobPropCreator
compaction.job.runner.class=gobblin.compaction.mapreduce.avro.MRCompactorAvroKeyDedupJobRunner
compaction.timezone=UTC
compaction.job.overwrite.output.dir=true
compaction.recompact.from.input.for.late.data=true
I tried these options to no success–
mapreduce.output.fileoutputformat.compress=true
mapreduce.output.fileoutputformat.compress.codec=hadoop.io.compress.SnappyCodec
mapreduce.output.fileoutputformat.compress.type=RECORD
writer.output.format=AVRO
writer.codec.type=SNAPPY
writer.builder.class=gobblin.writer.AvroDataWriterBuilder
Kindly let know how to proceed on this. Am I missing some configuration
parameters?
Thanks
Sushant Pandey