[jira] [Commented] (SPARK-16468) Confusing results when describe() used on DataFrame with chr columns

Shivaram Venkataraman (JIRA) Mon, 11 Jul 2016 10:17:46 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-16468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371188#comment-15371188
 ]


Shivaram Venkataraman commented on SPARK-16468:
-----------------------------------------------

Are the character columns problem fixed by SPARK-16429 ? Regarding the 
rounding, I think we just create a R data.frame and then let R format it. Could 
you check what the output of `options("digits")` is in your R session ?

cc [~dongjoon] 

> Confusing results when describe() used on DataFrame with chr columns
> --------------------------------------------------------------------
>
>                 Key: SPARK-16468
>                 URL: https://issues.apache.org/jira/browse/SPARK-16468
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 1.6.1
>         Environment: Databricks.com
>            Reporter: Neil Dewar
>            Priority: Minor
>
> The describe() function returns statistical summaries on numeric columns of a 
> DataFrame.  If the DataFrame contains columns of type chr, only the count, 
> min and max stats are returned.
> When a dataframe contains a mixture of numeric and chr columns, the results 
> become jumbled together.
> Example:
> sdfR <- createDataFrame(sqlContext, ToothGrowth)
> collect(describe(sdfR))
> Results:
>    summary                len supp               dose
> 1   count                 60   60                 60
> 2    mean 18.813333333333336  1.1666666666666667
> 3  stddev  7.649315171887615  0.6288721857330792
> 4     min                4.2   OJ                0.5
> 5     max               33.9   VC                2.0
> There appear to be two problems here:
> (1) The mean and stdev values have not been rounded for the columns where 
> there are valid values
> (2) There is no ability to distinguish that the supp column has no values in 
> mean and stdev rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-16468) Confusing results when describe() used on DataFrame with chr columns

Reply via email to