[
https://issues.apache.org/jira/browse/SPARK-16468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371188#comment-15371188
]
Shivaram Venkataraman commented on SPARK-16468:
-----------------------------------------------
Are the character columns problem fixed by SPARK-16429 ? Regarding the
rounding, I think we just create a R data.frame and then let R format it. Could
you check what the output of `options("digits")` is in your R session ?
cc [~dongjoon]
> Confusing results when describe() used on DataFrame with chr columns
> --------------------------------------------------------------------
>
> Key: SPARK-16468
> URL: https://issues.apache.org/jira/browse/SPARK-16468
> Project: Spark
> Issue Type: Bug
> Components: SparkR
> Affects Versions: 1.6.1
> Environment: Databricks.com
> Reporter: Neil Dewar
> Priority: Minor
>
> The describe() function returns statistical summaries on numeric columns of a
> DataFrame. If the DataFrame contains columns of type chr, only the count,
> min and max stats are returned.
> When a dataframe contains a mixture of numeric and chr columns, the results
> become jumbled together.
> Example:
> sdfR <- createDataFrame(sqlContext, ToothGrowth)
> collect(describe(sdfR))
> Results:
> summary len supp dose
> 1 count 60 60 60
> 2 mean 18.813333333333336 1.1666666666666667
> 3 stddev 7.649315171887615 0.6288721857330792
> 4 min 4.2 OJ 0.5
> 5 max 33.9 VC 2.0
> There appear to be two problems here:
> (1) The mean and stdev values have not been rounded for the columns where
> there are valid values
> (2) There is no ability to distinguish that the supp column has no values in
> mean and stdev rows.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]