[ 
https://issues.apache.org/jira/browse/SPARK-13766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-13766:
---------------------------------
    Comment: was deleted

(was: Partly due to "auto detection" for data source SPARK-8000. If that 
happens, then this will fail to load back in Spark (haven't tested yet. If you 
think I need to test, then I will).)

> Inconsistent file extensions and omitted file extensions written by CSV, TEXT 
> and JSON data sources
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-13766
>                 URL: https://issues.apache.org/jira/browse/SPARK-13766
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Hyukjin Kwon
>            Priority: Minor
>
> Currently, the output (part-files) from CSV, TEXT and JSON data sources do 
> not have file extensions such as .csv, .txt and .json (except for compression 
> extensions such as .gz, .deflate and .bz4).
> In addition, it looks Parquet has the extensions (in part-files) such as 
> .gz.parquet or .snappy.parquet according to compression codecs whereas ORC 
> does not have such extensions but it is just .orc.
> So, in a simple view, currently the extensions are set as below:
> {code}
> TEXT, CSV and JSON - [.COMPRESSION_CODEC_NAME]
> Parquet -  [.COMPRESSION_CODEC_NAME].parquet
> ORC - .orc
> {code}
> It would be great if we have a consistent naming for them



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to