[ 
https://issues.apache.org/jira/browse/SPARK-8000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153562#comment-15153562
 ] 

Reynold Xin commented on SPARK-8000:
------------------------------------

I think we can do some simple things first. For example, if the file name 
contains "parquet", then use "parquet". If it includes "json", then use json. 
If it includes "csv", then use "csv". If it is "txt", then use "text".

As long as this is very modular, and has the proper error messages when it 
cannot figure out the data source, then we should be OK.


> SQLContext.read.load() should be able to auto-detect input data
> ---------------------------------------------------------------
>
>                 Key: SPARK-8000
>                 URL: https://issues.apache.org/jira/browse/SPARK-8000
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Reynold Xin
>
> If it is a parquet file, use parquet. If it is a JSON file, use JSON. If it 
> is an ORC file, use ORC. If it is a CSV file, use CSV.
> Maybe Spark SQL can also write an output metadata file to specify the schema 
> & data source that's used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to