CompressionCodecFactory returns unconfigured GZipCodec if io.compression.codecs is not set ------------------------------------------------------------------------------------------
Key: HADOOP-7196 URL: https://issues.apache.org/jira/browse/HADOOP-7196 Project: Hadoop Common Issue Type: Bug Affects Versions: 0.20.2 Reporter: Peter Voss In case io.compression.codecs property is not set the GZipCodec is added using this code: {code:java} List<Class<? extends CompressionCodec>> codecClasses = getCodecClasses(conf); if (codecClasses == null) { addCodec(new GzipCodec()); addCodec(new DefaultCodec()); } else { Iterator<Class<? extends CompressionCodec>> itr = codecClasses.iterator(); while (itr.hasNext()) { CompressionCodec codec = ReflectionUtils.newInstance(itr.next(), conf); addCodec(codec); } } {code} which leaves GzipCodec unconfigured. If it is set via the {{io.compression.codecs}} property it gets configured properly using ReflectionUtils.newInstance(..., conf). I have seen a lot of NPEs on systems that don't have this property set when using a LineRecordReader (that internally gets the codec from CompressionCodecFactory). I would suggest to use {{org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec}} as default value for {{io.compression.codecs}}, instead of having another independent code path that deals with the case that this property is not set. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira