[ https://issues.apache.org/jira/browse/SPARK-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048189#comment-15048189 ]
Yanbo Liang edited comment on SPARK-12232 at 12/9/15 7:18 AM: -------------------------------------------------------------- I vote for do not expose read.table because it has different semantics compared with base R and other read.*** functions. In function "SQLContext.read.table(tableName: String)", users load a table as a DataFrame by specifying the tableName, but the table metadata must already exist in the catalog such as "HiveMetastoreCatalog". It means users can not use "read.table()" to load an external data source as a DataFrame if it does not have metadata stored at Spark catalog, user must know the file format and use corresponding function such as "read.json". The read.table interface mainly used to access a table which has already loaded into Spark as RDD at Spark SQL side, consider that RDD will be deprecated at 2.0, I think it's unnecessary for SparkR. was (Author: yanboliang): I vote for do not expose read.table because it has different semantics compared with base R and other read.*** functions. In function "SQLContext.read.table(tableName: String)", users load a table as a DataFrame by specifying the tableName, but the table metadata must already exist in the catalog such as "HiveMetastoreCatalog". It means users can not use "read.table()" to load an external data source as a DataFrame if it does not have metadata stored at Spark catalog, user must know the file format and use corresponding function such as "read.json". The read.table interface mainly used to access a table which has already loaded into Spark as RDD at Spark SQL side, so I think it's unnecessary for SparkR. > Consider exporting read.table in R > ---------------------------------- > > Key: SPARK-12232 > URL: https://issues.apache.org/jira/browse/SPARK-12232 > Project: Spark > Issue Type: Bug > Components: SparkR > Affects Versions: 1.5.2 > Reporter: Felix Cheung > Priority: Minor > > Since we have read.df, read.json, read.parquet (some in pending PRs), we have > table() and we should consider having read.table() for consistency and > R-likeness. > However, this conflicts with utils::read.table which returns a R data.frame. > It seems neither table() or read.table() is desirable in this case. > table: https://stat.ethz.ch/R-manual/R-devel/library/base/html/table.html > read.table: > https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org