[ https://issues.apache.org/jira/browse/SPARK-8000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153562#comment-15153562 ]
Reynold Xin commented on SPARK-8000: ------------------------------------ I think we can do some simple things first. For example, if the file name contains "parquet", then use "parquet". If it includes "json", then use json. If it includes "csv", then use "csv". If it is "txt", then use "text". As long as this is very modular, and has the proper error messages when it cannot figure out the data source, then we should be OK. > SQLContext.read.load() should be able to auto-detect input data > --------------------------------------------------------------- > > Key: SPARK-8000 > URL: https://issues.apache.org/jira/browse/SPARK-8000 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Reynold Xin > > If it is a parquet file, use parquet. If it is a JSON file, use JSON. If it > is an ORC file, use ORC. If it is a CSV file, use CSV. > Maybe Spark SQL can also write an output metadata file to specify the schema > & data source that's used. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org