[
https://issues.apache.org/jira/browse/HADOOP-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233987#comment-17233987
]
Eric Payne commented on HADOOP-17381:
-------------------------------------
One hacky way that might work and be extensible to all codecs is to create an
explicit codec for intermediate outputs, like IntermediateOutputCodec, that has
its own config namespace with a prefix, like io.compression.codec.intermediate.
Users can then configure the same codec being used for the output codec but as
a sub-codec to IntermediateOutputCodec. IntermediateOutputCodec would gather
all configs with the io.compression.codec.intermediate prefix, strip the
prefix, then reapply to a new Configuration that becomes the config object for
the sub-codec, allowing it to have different settings than the output codec
even though the same codec type is being used underneath.
> Ability to specify separate compression settings when intermediate and final
> output use the same codec
> ------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-17381
> URL: https://issues.apache.org/jira/browse/HADOOP-17381
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Eric Payne
> Priority: Major
>
> The ZStandard codec may become a codec that users will want to use for both
> intermediate data and for final output data yet specify different compression
> levels for those use cases.
> It would be nice if there was a way we could create a "meta codec" like
> IntermediateCodec that used conf prefix techniques, like Oozie does with
> oozie.launcher for the Oozie launcher configs, to create a custom config
> namespace of sorts for setting arbitrary codec settings specific to the
> intermediate codec separate from the final output codec even if the same
> underlying codec is used for both.
> However Codecs don't allow a configuration to be passed when obtaining a
> codec stream, and I think we would have to bypass the CodecPool entirely to
> be able to pass a custom conf to an arbitrary Codec.
> Another approach is to skip trying to generalize the solution and
> specifically focus on ZStandard. It would be easy to create a wrapper codec
> around the existing ZStandardCompressor and ZStandardDecompressor which take
> the relevant parameters directly in their constructors.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]