[ 
https://issues.apache.org/jira/browse/SPARK-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai updated SPARK-1969:
---------------------------

    Description: 
It basically moved the private ColumnStatisticsAggregator class from RowMatrix 
to public available DeveloperApi. 

Changes:
1) Moved the trait from 
org.apache.spark.mllib.stat.MultivariateStatisticalSummary to 
org.apache.spark.mllib.stats.Summarizer 
2) Moved the private implementation from org.apache.spark.mllib.linalg. 
ColumnStatisticsAggregator to org.apache.spark.mllib.stats.OnlineSummarizer
3) When creating OnlineSummarizer object, the number of columns is not needed 
in the constructor. It's determined when users add the first sample.
4) Added the API documentation for OnlineSummarizer
5) Added the unittest for OnlineSummarizer

  was:
Basically, it moves the private ColumnStatisticsAggregator class from RowMatrix 
to public available DeveloperApi. 

Changes:
1) Moved the trait from 
org.apache.spark.mllib.stat.MultivariateStatisticalSummary to 
org.apache.spark.mllib.stats.Summarizer 
2) Moved the private implementation from org.apache.spark.mllib.linalg. 
ColumnStatisticsAggregator to org.apache.spark.mllib.stats.OnlineSummarizer
3) When creating OnlineSummarizer object, the number of columns is not needed 
in the constructor. It's determined when users add the first sample.
4) Added the API documentation for OnlineSummarizer
5) Added the unittest for OnlineSummarizer


> Public available online summarizer for mean, variance, min, and max
> -------------------------------------------------------------------
>
>                 Key: SPARK-1969
>                 URL: https://issues.apache.org/jira/browse/SPARK-1969
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: DB Tsai
>
> It basically moved the private ColumnStatisticsAggregator class from 
> RowMatrix to public available DeveloperApi. 
> Changes:
> 1) Moved the trait from 
> org.apache.spark.mllib.stat.MultivariateStatisticalSummary to 
> org.apache.spark.mllib.stats.Summarizer 
> 2) Moved the private implementation from org.apache.spark.mllib.linalg. 
> ColumnStatisticsAggregator to org.apache.spark.mllib.stats.OnlineSummarizer
> 3) When creating OnlineSummarizer object, the number of columns is not needed 
> in the constructor. It's determined when users add the first sample.
> 4) Added the API documentation for OnlineSummarizer
> 5) Added the unittest for OnlineSummarizer



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to