GitHub user jingyimei opened a pull request:
https://github.com/apache/madlib/pull/220
Add more stats to summary function
This PR added the following statistics to madlib.summary():
positive values
negative values
zero values
95% confidence intervals on mean
User docs is updated due to adding new fields.
Besides, we rename 'row_count' to 'num_col_summarized' in summary() return
result to eliminate confusion from another `row_count` in
user_defined_summary_table.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jingyimei/madlib summary_more_stats
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/madlib/pull/220.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #220
----
commit cbd3cd4c0af5005ff4ea7d82cab74325221d4311
Author: Jingyi Mei <jmei@...>
Date: 2017-12-20T13:18:05Z
Add more stats to summary function
This commit added the following statistics to madlib.summary():
positive values
negative values
zero values
95% confidence intervals on mean
User docs is updated due to adding new fields.
commit da7fea93b6fab72451643a0264b6e841448c9a4d
Author: Jingyi Mei <jmei@...>
Date: 2017-12-21T05:34:34Z
Rename 'row_count' to 'num_col_summarized' in summary() return result
Previously, when we run `SELECT * FROM madlib.summary(valid_inputs)`, it
returns a
composite type containing a filed named `row_count`, which refers to the
number of rows in the output table.
when we run `SELECT * FROM user_defined_summary_table;`, it also
contains a column named `row_count`, which refers to number of rows for
the target column.
To eliminate the confusion, we rename the first `row_count` to
`num_col_summarized`, modify explanation and also update user doc.
----
---