Documentation fixes for Evaluation
Project: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/commit/b0584733 Tree: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/tree/b0584733 Diff: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/diff/b0584733 Branch: refs/heads/master Commit: b05847334b7a488d36315e52663cecd7f7d56e4c Parents: 098a7f3 Author: Makoto Yui <m...@apache.org> Authored: Wed Sep 13 22:54:42 2017 +0900 Committer: Makoto Yui <m...@apache.org> Committed: Wed Sep 13 22:54:42 2017 +0900 ---------------------------------------------------------------------- docs/gitbook/eval/auc.md | 2 +- .../eval/binary_classification_measures.md | 18 ++++++++-------- .../eval/multilabel_classification_measures.md | 22 +++++++++----------- 3 files changed, 20 insertions(+), 22 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/b0584733/docs/gitbook/eval/auc.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/eval/auc.md b/docs/gitbook/eval/auc.md index b8f7f0b..6543d32 100644 --- a/docs/gitbook/eval/auc.md +++ b/docs/gitbook/eval/auc.md @@ -100,7 +100,7 @@ Note that `floor(prob / 0.2)` means that the rows are distributed to 5 bins for # Difference between AUC and Logarithmic Loss -Hivemall has another metric called [Logarithmic Loss](stat_eval.html#logarithmic-loss) for binary classification. Both AUC and Logarithmic Loss compute scores for probability-label pairs. +Hivemall has another metric called [Logarithmic Loss](regression.html#logarithmic-loss) for binary classification. Both AUC and Logarithmic Loss compute scores for probability-label pairs. Score produced by AUC is a relative metric based on sorted pairs. On the other hand, Logarithmic Loss simply gives a metric by comparing probability with its truth label one-by-one. http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/b0584733/docs/gitbook/eval/binary_classification_measures.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/eval/binary_classification_measures.md b/docs/gitbook/eval/binary_classification_measures.md index 5121ffe..ddb7bff 100644 --- a/docs/gitbook/eval/binary_classification_measures.md +++ b/docs/gitbook/eval/binary_classification_measures.md @@ -21,29 +21,26 @@ # Binary problems -Binary classification problem is the task to predict the label of each data given two categories. +Binary classification is a task to predict a label of each data given two categories. -Hivemall provides some tutorials to deal with binary classification problems as follows: +Hivemall provides several tutorials to deal with binary classification problems as follows: - [Online advertisement click prediction](../binaryclass/general.html) - [News classification](../binaryclass/news20_dataset.html) -This page focuses on the evaluation for such binary classification problems. +This page focuses on the evaluation of such binary classification problems. If your classifier outputs probability rather than 0/1 label, evaluation based on [Area Under the ROC Curve](./auc.md) would be more appropriate. # Example -For the metrics explanation, this page introduces toy example data and two metrics. +This page introduces toy example data and two metrics for explanation. ## Data -The following table shows the sample of binary classification's prediction. -In this case, `1` means positive label and `0` means negative label. -Left column includes supervised label data, -and center column includes predicted label by a binary classifier. +The following table shows examples of binary classification's prediction. -| truth label| predicted label | | +| truth label| predicted label | description | |:---:|:---:|:---:| | 1 | 0 |False Negative| | 0 | 1 |False Positive| @@ -52,6 +49,9 @@ and center column includes predicted label by a binary classifier. | 0 | 1 |False Positive| | 0 | 0 |True Negative| +In this case, `1` means positive label and `0` means negative label. +The leftmost column shows truth labels, and center column includes predicted labels. + ## Preliminary metrics Some evaluation metrics are calculated based on 4 values: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/b0584733/docs/gitbook/eval/multilabel_classification_measures.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/eval/multilabel_classification_measures.md b/docs/gitbook/eval/multilabel_classification_measures.md index fb2d6c0..4baa178 100644 --- a/docs/gitbook/eval/multilabel_classification_measures.md +++ b/docs/gitbook/eval/multilabel_classification_measures.md @@ -22,21 +22,21 @@ # Multi-label classification -Multi-label classification problem is the task to predict the labels given categorized dataset. -Each sample $$i$$ has $$l_i$$ labels, where $$L$$ is a set of unique labels in the dataset, and $$0 \leq l_i \leq |L|$$. +Multi-label classification problem is a task to predict labels given two or more categories. -This page focuses on evaluation of the results from such multi-label classification problems. +Each sample $$i$$ has $$l_i$$ labels, where $$L$$ is a set of unique labels in the dataset, and $$0 \leq l_i \leq |L|$$. +This page focuses on evaluation of such multi-label classification problems. # Example -For the metrics explanation, this page introduces toy example dataset. +This page introduces toy example dataset for explanation. ## Data -The following table shows the sample of multi-label classification's prediction. -Animal names represent the tags of blog post. -Left column includes supervised labels, -and right column includes predicted labels by a multi-label classifier. +The following table shows examples of multi-label classification's prediction. + +Suppose that animal names represent tags of blog posts and the given task is to predict tags for blog posts. +The left column shows the ground truth labels and the right column shows predicted labels by a multi-label classifier. | truth labels| predicted labels | |:---:|:---:| @@ -53,10 +53,8 @@ and right column includes predicted labels by a multi-label classifier. Hivemall provides micro F1-score and micro F-measure. -Define $$L$$ is the set of the tag of blog posts, and -$$l_i$$ is a tag set of $$i$$th document. -In the same manner, -$$p_i$$ is a predicted tag set of $$i$$th document. +Define $$L$$ is the set of the tag of blog posts, and $$l_i$$ is a tag set of $$i$$-th document. +In the same manner, $$p_i$$ is a predicted tag set of $$i$$-th document. ## Micro F1-score