[
https://issues.apache.org/jira/browse/FLINK-13924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Konstantin Knauf updated FLINK-13924:
-------------------------------------
Labels: pull-request-available (was: pull-request-available stale-major)
Removed "stale-critical|major|minor" label in line with
https://issues.apache.org/jira/browse/FLINK-22429.
> Add summarizer and summary for sparse vector and dense vector.
> --------------------------------------------------------------
>
> Key: FLINK-13924
> URL: https://issues.apache.org/jira/browse/FLINK-13924
> Project: Flink
> Issue Type: Sub-task
> Components: Library / Machine Learning
> Reporter: Xu Yang
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Summarizer is the class for calculating statistics, summary is the result
> class of summarizer. Summary defines methods to get statistics. Assuming that
> the data has dense vector and sparse vector, vectors size are not equal also,
> so if DenseVectorSummarizer visit a sparse vector, it will change to
> SparseVectorSummarizer.
> Statistics include vectorSize, count, mean, variance, min, max,
> standardDeviation, normL1, normL2.
> * Add SparseVectorSummarizer which will calculate statistics for sparse
> vector.
> * Add SparseVectorSummary which can get statistics of sparse vector.
> * Add DenseVectorSummarizer which will calculate statistics for dense vector.
> * Add DenseVectorSummary which can get statistics of sparse vector.
> * Add StatisticsUtil which provides utility functions for summarizer and
> summary.
> * Add VectorSummarizerUtil which provides utility functions for
> VectorSummarizer.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)