Nicholas Chammas created HADOOP-17562:
-----------------------------------------
Summary: Provide mechanism for explicitly specifying the
compression codec for input files
Key: HADOOP-17562
URL: https://issues.apache.org/jira/browse/HADOOP-17562
Project: Hadoop Common
Issue Type: Improvement
Reporter: Nicholas Chammas
I come to you via SPARK-29280.
I am looking for the file _input_ equivalents of the following settings:
{code:java}
mapreduce.output.fileoutputformat.compress
mapreduce.map.output.compress{code}
Right now, I understand that Hadoop infers the codec to use when reading a file
from the file's extension.
However, in some cases the files may have the incorrect extension or no
extension. There are links to some examples from SPARK-29280.
Ideally, you should be able to explicitly specify the codec to use to read
those files. I don't believe that's possible today. Instead, the current
workaround appears to be to [create a custom codec
class|https://stackoverflow.com/a/17152167/877069] and override the
getDefaultExtension method to specify the extension to expect.
Does it make sense to offer an explicit way to select the compression codec for
file input, mirroring how things work for file output?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]