[
https://issues.apache.org/jira/browse/SPARK-13766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-13766:
---------------------------------
Comment: was deleted
(was: Partly due to "auto detection" for data source SPARK-8000. If that
happens, then this will fail to load back in Spark (haven't tested yet. If you
think I need to test, then I will).)
> Inconsistent file extensions and omitted file extensions written by CSV, TEXT
> and JSON data sources
> ---------------------------------------------------------------------------------------------------
>
> Key: SPARK-13766
> URL: https://issues.apache.org/jira/browse/SPARK-13766
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: Hyukjin Kwon
> Priority: Minor
>
> Currently, the output (part-files) from CSV, TEXT and JSON data sources do
> not have file extensions such as .csv, .txt and .json (except for compression
> extensions such as .gz, .deflate and .bz4).
> In addition, it looks Parquet has the extensions (in part-files) such as
> .gz.parquet or .snappy.parquet according to compression codecs whereas ORC
> does not have such extensions but it is just .orc.
> So, in a simple view, currently the extensions are set as below:
> {code}
> TEXT, CSV and JSON - [.COMPRESSION_CODEC_NAME]
> Parquet - [.COMPRESSION_CODEC_NAME].parquet
> ORC - .orc
> {code}
> It would be great if we have a consistent naming for them
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]