[jira] [Commented] (HADOOP-14668) Remove Configurable Default Sequence File Compression Type

Chen Liang (JIRA) Wed, 19 Jul 2017 13:06:42 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-14668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093717#comment-16093717
 ]


Chen Liang commented on HADOOP-14668:
-------------------------------------

This seems a bit more complex it seems. The issue is that 
{{mapreduce.output.fileoutputformat.compress.type}} seems to be a deprecated 
key, it is declared {{DeprecatedProperties.md}}. But, it is actually being used 
in different places. While this key {{io.seqfile.compression.type}} seems to be 
the right one to use, but it is not used anywhere except internally in 
{{SequenceFile.java}}. For compatibility reason, we will need to keep both keys 
check their values.

So, maybe one way to go here is that, check:
1. if both properties are set, use the value of {{io.seqfile.compression.type}}.
2. if only one of the two properties is set, use that value, and set the value 
of the other key to the same value.

Since this seems to be used mainly by MapReduce, I'll leave this proposal for a 
while before submitting patch, to see if there are any thoughts from other 
people.

> Remove Configurable Default Sequence File Compression Type
> ----------------------------------------------------------
>
>                 Key: HADOOP-14668
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14668
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 2.8.1, 3.0.0-alpha3
>            Reporter: BELUGA BEHR
>            Assignee: Chen Liang
>            Priority: Trivial
>
> It is confusing to have two different ways to set the Sequence File 
> compression type.
> In a basic configuration, I can set 
> _mapreduce.output.fileoutputformat.compress.type_ or 
> _io.seqfile.compression.type_.  If I would like to set a default value, I 
> should set it by setting the cluster environment's mapred-site.xml file 
> setting for _mapreduce.output.fileoutputformat.compress.type_.
> Please remove references to this magic string _io.seqfile.compression.type_, 
> remove the {{setDefaultCompressionType}} method, and have 
> {{getDefaultCompressionType}} return value hard-coded to 
> {{CompressionType.RECORD}}.  This will make administration easier as I have 
> to only interrogate one configuration.
> {code:title=org.apache.hadoop.io.SequenceFile}
>   /**
>    * Get the compression type for the reduce outputs
>    * @param job the job config to look in
>    * @return the kind of compression to use
>    */
>   static public CompressionType getDefaultCompressionType(Configuration job) {
>     String name = job.get("io.seqfile.compression.type");
>     return name == null ? CompressionType.RECORD : 
>       CompressionType.valueOf(name);
>   }
>   
>   /**
>    * Set the default compression type for sequence files.
>    * @param job the configuration to modify
>    * @param val the new compression type (none, block, record)
>    */
>   static public void setDefaultCompressionType(Configuration job, 
>                                                CompressionType val) {
>     job.set("io.seqfile.compression.type", val.toString());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-14668) Remove Configurable Default Sequence File Compression Type

Reply via email to