[
https://issues.apache.org/jira/browse/SPARK-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
DB Tsai updated SPARK-1969:
---------------------------
Description:
It basically moved the private ColumnStatisticsAggregator class from RowMatrix
to public available DeveloperApi.
Changes:
1) Moved the trait from
org.apache.spark.mllib.stat.MultivariateStatisticalSummary to
org.apache.spark.mllib.stats.Summarizer
2) Moved the private implementation from org.apache.spark.mllib.linalg.
ColumnStatisticsAggregator to org.apache.spark.mllib.stats.OnlineSummarizer
3) When creating OnlineSummarizer object, the number of columns is not needed
in the constructor. It's determined when users add the first sample.
4) Added the API documentation for OnlineSummarizer
5) Added the unittest for OnlineSummarizer
was:
Basically, it moves the private ColumnStatisticsAggregator class from RowMatrix
to public available DeveloperApi.
Changes:
1) Moved the trait from
org.apache.spark.mllib.stat.MultivariateStatisticalSummary to
org.apache.spark.mllib.stats.Summarizer
2) Moved the private implementation from org.apache.spark.mllib.linalg.
ColumnStatisticsAggregator to org.apache.spark.mllib.stats.OnlineSummarizer
3) When creating OnlineSummarizer object, the number of columns is not needed
in the constructor. It's determined when users add the first sample.
4) Added the API documentation for OnlineSummarizer
5) Added the unittest for OnlineSummarizer
> Public available online summarizer for mean, variance, min, and max
> -------------------------------------------------------------------
>
> Key: SPARK-1969
> URL: https://issues.apache.org/jira/browse/SPARK-1969
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Reporter: DB Tsai
>
> It basically moved the private ColumnStatisticsAggregator class from
> RowMatrix to public available DeveloperApi.
> Changes:
> 1) Moved the trait from
> org.apache.spark.mllib.stat.MultivariateStatisticalSummary to
> org.apache.spark.mllib.stats.Summarizer
> 2) Moved the private implementation from org.apache.spark.mllib.linalg.
> ColumnStatisticsAggregator to org.apache.spark.mllib.stats.OnlineSummarizer
> 3) When creating OnlineSummarizer object, the number of columns is not needed
> in the constructor. It's determined when users add the first sample.
> 4) Added the API documentation for OnlineSummarizer
> 5) Added the unittest for OnlineSummarizer
--
This message was sent by Atlassian JIRA
(v6.2#6252)