Nicholas Chammas created HADOOP-17562: -----------------------------------------
Summary: Provide mechanism for explicitly specifying the compression codec for input files Key: HADOOP-17562 URL: https://issues.apache.org/jira/browse/HADOOP-17562 Project: Hadoop Common Issue Type: Improvement Reporter: Nicholas Chammas I come to you via SPARK-29280. I am looking for the file _input_ equivalents of the following settings: {code:java} mapreduce.output.fileoutputformat.compress mapreduce.map.output.compress{code} Right now, I understand that Hadoop infers the codec to use when reading a file from the file's extension. However, in some cases the files may have the incorrect extension or no extension. There are links to some examples from SPARK-29280. Ideally, you should be able to explicitly specify the codec to use to read those files. I don't believe that's possible today. Instead, the current workaround appears to be to [create a custom codec class|https://stackoverflow.com/a/17152167/877069] and override the getDefaultExtension method to specify the extension to expect. Does it make sense to offer an explicit way to select the compression codec for file input, mirroring how things work for file output? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org