[jira] [Comment Edited] (SPARK-12232) Consider exporting read.table in R

Yanbo Liang (JIRA) Tue, 08 Dec 2015 23:19:12 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048189#comment-15048189
 ]


Yanbo Liang edited comment on SPARK-12232 at 12/9/15 7:18 AM:
--------------------------------------------------------------

I vote for do not expose read.table because it has different semantics compared 
with base R and other read.*** functions.
In function "SQLContext.read.table(tableName: String)", users load a table as a 
DataFrame by specifying the tableName, but the table metadata must already 
exist in the catalog such as "HiveMetastoreCatalog". It means users can not use 
"read.table()" to load an external data source as a DataFrame if it does not 
have metadata stored at Spark catalog, user must know the file format and use 
corresponding function such as "read.json".
The read.table interface mainly used to access a table which has already loaded 
into Spark as RDD at Spark SQL side, consider that RDD will be deprecated at 
2.0, I think it's unnecessary for SparkR. 



was (Author: yanboliang):
I vote for do not expose read.table because it has different semantics compared 
with base R and other read.*** functions.
In function "SQLContext.read.table(tableName: String)", users load a table as a 
DataFrame by specifying the tableName, but the table metadata must already 
exist in the catalog such as "HiveMetastoreCatalog". It means users can not use 
"read.table()" to load an external data source as a DataFrame if it does not 
have metadata stored at Spark catalog, user must know the file format and use 
corresponding function such as "read.json".
The read.table interface mainly used to access a table which has already loaded 
into Spark as RDD at Spark SQL side, so I think it's unnecessary for SparkR. 


> Consider exporting read.table in R
> ----------------------------------
>
>                 Key: SPARK-12232
>                 URL: https://issues.apache.org/jira/browse/SPARK-12232
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 1.5.2
>            Reporter: Felix Cheung
>            Priority: Minor
>
> Since we have read.df, read.json, read.parquet (some in pending PRs), we have 
> table() and we should consider having read.table() for consistency and 
> R-likeness.
> However, this conflicts with utils::read.table which returns a R data.frame.
> It seems neither table() or read.table() is desirable in this case.
> table: https://stat.ethz.ch/R-manual/R-devel/library/base/html/table.html
> read.table: 
> https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-12232) Consider exporting read.table in R

Reply via email to