[jira] [Created] (HADOOP-17562) Provide mechanism for explicitly specifying the compression codec for input files

Nicholas Chammas (Jira) Wed, 03 Mar 2021 09:58:08 -0800

Nicholas Chammas created HADOOP-17562:
-----------------------------------------


             Summary: Provide mechanism for explicitly specifying the 
compression codec for input files
                 Key: HADOOP-17562
                 URL: https://issues.apache.org/jira/browse/HADOOP-17562
             Project: Hadoop Common
          Issue Type: Improvement
            Reporter: Nicholas Chammas


I come to you via SPARK-29280.

I am looking for the file _input_ equivalents of the following settings:
{code:java}
mapreduce.output.fileoutputformat.compress
mapreduce.map.output.compress{code}
Right now, I understand that Hadoop infers the codec to use when reading a file 
from the file's extension.

However, in some cases the files may have the incorrect extension or no 
extension. There are links to some examples from SPARK-29280.

Ideally, you should be able to explicitly specify the codec to use to read 
those files. I don't believe that's possible today. Instead, the current 
workaround appears to be to [create a custom codec 
class|https://stackoverflow.com/a/17152167/877069] and override the 
getDefaultExtension method to specify the extension to expect.

Does it make sense to offer an explicit way to select the compression codec for 
file input, mirroring how things work for file output?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (HADOOP-17562) Provide mechanism for explicitly specifying the compression codec for input files

Reply via email to