Repository: incubator-hivemall Updated Branches: refs/heads/master 97bc91247 -> 1e83eb55d
Fixed a documentation error in AUC computation Project: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/commit/1e83eb55 Tree: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/tree/1e83eb55 Diff: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/diff/1e83eb55 Branch: refs/heads/master Commit: 1e83eb55d7bedb6be965cc04a61d7a7d995abc9d Parents: 97bc912 Author: myui <yuin...@gmail.com> Authored: Tue Feb 28 20:06:05 2017 +0900 Committer: myui <yuin...@gmail.com> Committed: Tue Feb 28 20:06:05 2017 +0900 ---------------------------------------------------------------------- docs/gitbook/eval/auc.md | 26 ++++++++++++-------------- 1 file changed, 12 insertions(+), 14 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/1e83eb55/docs/gitbook/eval/auc.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/eval/auc.md b/docs/gitbook/eval/auc.md index 3c8de95..3fba0bb 100644 --- a/docs/gitbook/eval/auc.md +++ b/docs/gitbook/eval/auc.md @@ -39,7 +39,7 @@ Once the rows are sorted by the probabilities in a descending order, AUC gives a # Compute AUC on Hivemall -On Hivemall, a function `auc(double score, int label)` provides a way to compute AUC for pairs of probability and truth label. +In Hivemall, a function `auc(double score, int label)` provides a way to compute AUC for pairs of probability and truth label. For instance, following query computes AUC of the table which was shown above: @@ -54,13 +54,14 @@ with data as ( select 0.8 as prob, 1 as label union all select 0.7 as prob, 1 as label -), data_ordered as ( +) +select + auc(prob, label) as auc +from ( select prob, label from data - order by prob desc -) -select auc(prob, label) -from data_ordered; + ORDER BY prob DESC +) t; ``` This query returns `0.83333` as AUC. @@ -80,16 +81,13 @@ with data as ( select 0.8 as prob, 1 as label union all select 0.7 as prob, 1 as label -), data_ordered as ( - select prob, label - from data - order by prob desc ) -select auc(prob, label) +select auc(prob, label) as auc from ( select prob, label - from data_ordered - distribute by floor(prob / 0.2) + from data + DISTRIBUTE BY floor(prob / 0.2) + SORT BY prob DESC ) t; ``` @@ -101,4 +99,4 @@ Hivemall has another metric called [Logarithmic Loss](stat_eval.html#logarithmic Score produced by AUC is a relative metric based on sorted pairs. On the other hand, Logarithmic Loss simply gives a metric by comparing probability with its truth label one-by-one. -To give an example, `auc(prob, label)` and `logloss(prob, label)` respectively returns `0.83333` and `0.54001` in the above case. Note that larger AUC and smaller Logarithmic Loss are better. \ No newline at end of file +To give an example, `auc(prob, label)` and `logloss(prob, label)` respectively returns `0.83333` and `0.54001` in the above case. Note that larger AUC and smaller Logarithmic Loss are better.