[
https://issues.apache.org/jira/browse/HADOOP-17562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nicholas Chammas updated HADOOP-17562:
--------------------------------------
Component/s: io
> Provide mechanism for explicitly specifying the compression codec for input
> files
> ---------------------------------------------------------------------------------
>
> Key: HADOOP-17562
> URL: https://issues.apache.org/jira/browse/HADOOP-17562
> Project: Hadoop Common
> Issue Type: Improvement
> Components: io
> Reporter: Nicholas Chammas
> Priority: Minor
>
> I come to you via SPARK-29280.
> I am looking for the file _input_ equivalents of the following settings:
> {code:java}
> mapreduce.output.fileoutputformat.compress
> mapreduce.map.output.compress{code}
> Right now, I understand that Hadoop infers the codec to use when reading a
> file from the file's extension.
> However, in some cases the files may have the incorrect extension or no
> extension. There are links to some examples from SPARK-29280.
> Ideally, you should be able to explicitly specify the codec to use to read
> those files. I don't believe that's possible today. Instead, the current
> workaround appears to be to [create a custom codec
> class|https://stackoverflow.com/a/17152167/877069] and override the
> getDefaultExtension method to specify the extension to expect.
> Does it make sense to offer an explicit way to select the compression codec
> for file input, mirroring how things work for file output?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]