[
https://issues.apache.org/jira/browse/FLINK-12671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xu Yang updated FLINK-12671:
----------------------------
Description:
We provide summary statistics for Table through Summarizer. User can easily get
the total count and the basic column-wise metrics: max, min, mean, variance,
standardDeviation, normL1, normL2, the number of missing values and the number
of valid values.
SparkML has same function,
[http://spark.apache.org/docs/latest/ml-statistics.html#summarizer]
{code:java|title=Example|borderStyle=solid}
String[] colNames = new String[]{"id", "height", "weight"};
Row[] data = new Row[]{
Row.of(1, 168, 48.1),
Row.of(2, 165, 45.8),
Row.of(3, 160, 45.3),
Row.of(4, 163, 41.9),
Row.of(5, 149, 40.5),
};
Table input = MLSession.createBatchTable(data, colNames);
TableSummary summary = new Summarizer(input).collectResult();
System.out.println(summary.mean("height"));
System.out.println(summary);
{code}
was:
We provide summary statistics for Table through Summarizer. User can easily get
the total count and the basic column-wise metrics: max, min, mean, variance,
standardDeviation, normL1, normL2, the number of missing values and the number
of valid values.
SparkML has same function,
[http://spark.apache.org/docs/latest/ml-statistics.html#summarizer]
Example:
String[] colNames = new String[] \{"id", "height", "weight"};
Row[] data = new Row[]{
Row.of(1, 168, 48.1),
Row.of(2, 165, 45.8),
Row.of(3, 160, 45.3),
Row.of(4, 163, 41.9),
Row.of(5, 149, 40.5),
};
Table input = new MemSourceBatchOp(data, colNames).getTable();
TableSummary summary = new Summarizer(input).collectResult();
System.out.println(summary.mean("height")); // print the mean of the
column(Name: “age”)
System.out.println(summary);
> Summarizer: summary statistics for Table
> ----------------------------------------
>
> Key: FLINK-12671
> URL: https://issues.apache.org/jira/browse/FLINK-12671
> Project: Flink
> Issue Type: Sub-task
> Components: Library / Machine Learning
> Reporter: Xu Yang
> Assignee: Xu Yang
> Priority: Major
>
> We provide summary statistics for Table through Summarizer. User can easily
> get the total count and the basic column-wise metrics: max, min, mean,
> variance, standardDeviation, normL1, normL2, the number of missing values and
> the number of valid values.
> SparkML has same function,
> [http://spark.apache.org/docs/latest/ml-statistics.html#summarizer]
>
> {code:java|title=Example|borderStyle=solid}
> String[] colNames = new String[]{"id", "height", "weight"};
> Row[] data = new Row[]{
> Row.of(1, 168, 48.1),
> Row.of(2, 165, 45.8),
> Row.of(3, 160, 45.3),
> Row.of(4, 163, 41.9),
> Row.of(5, 149, 40.5),
> };
> Table input = MLSession.createBatchTable(data, colNames);
> TableSummary summary = new Summarizer(input).collectResult();
> System.out.println(summary.mean("height"));
> System.out.println(summary);
> {code}
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)