[
https://issues.apache.org/jira/browse/SPARK-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533918#comment-14533918
]
Sun Rui edited comment on SPARK-7435 at 5/8/15 5:57 AM:
--------------------------------------------------------
[~shivaram] Thank you for pointing out the reason for such situation. As
documented in
https://stat.ethz.ch/R-manual/R-devel/library/methods/html/show.html, show() is
invoked for automatic printing of an S4 object, something like toString() in
Scala and __repr__() in pySpark. I agree that we keep show() as is.
As for showDF() (counterpart of show() in Scala/pySpark), it serves different
goal from head(). head() is for retrieving row objects, while showDF() is for
printing rows in tabular form. Following code is to demonstrate the difference:
{panel}
df<-createDataFrame(sqlCtx, list(1,2,3))
head(df)
_1
1 1
2 2
3 3
cat(showDF(df))
+---+
| _1|
+---+
|1.0|
|2.0|
|3.0|
+---+
{panel}
I would suggest keep showDF(). But currently showDF() has a problem that it
does honor the escaping characters in the string returned by Scala
DF.showString(). so its output is like :
"+---+\n| _1|\n+---+\n|1.0|\n|2.0|\n|3.0|\n+---+\n"
I think we can modify it by using R cat() function to print the string.
was (Author: sunrui):
[~shivaram] Thank you for pointing out the reason for such situation. As
documented in
https://stat.ethz.ch/R-manual/R-devel/library/methods/html/show.html, show() is
invoked for automatic printing of an S4 object, something like toString() in
Scala and __repr__() in pySpark. I agree that we keep show() as is.
As for showDF() (counterpart of show() in Scala/pySpark), it serves different
goal from head(). head() is for retrieving row objects, while showDF() is for
printing rows in tabular form. Following code is to demonstrate the difference:
```
df<-createDataFrame(sqlCtx, list(1,2,3))
head(df)
_1
1 1
2 2
3 3
cat(showDF(df))
+---+
| _1|
+---+
|1.0|
|2.0|
|3.0|
+---+
```
I would suggest keep showDF(). But currently showDF() has a problem that it
does honor the escaping characters in the string returned by Scala
DF.showString(). so its output is like :
"+---+\n| _1|\n+---+\n|1.0|\n|2.0|\n|3.0|\n+---+\n"
I think we can modify it by using R cat() function to print the string.
> Make DataFrame.show() consistent with that of Scala and pySpark
> ---------------------------------------------------------------
>
> Key: SPARK-7435
> URL: https://issues.apache.org/jira/browse/SPARK-7435
> Project: Spark
> Issue Type: Improvement
> Components: SparkR
> Affects Versions: 1.4.0
> Reporter: Sun Rui
> Priority: Blocker
>
> Currently in SparkR, DataFrame has two methods show() and showDF(). show()
> prints the DataFrame column names and types and showDF() prints the first
> numRows rows of a DataFrame.
> In Scala and pySpark, show() is used to prints rows of a DataFrame.
> We'd better keep API consistent unless there is some important reason. So
> propose to interchange the names (show() and showDF()) in SparkR.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]