spark git commit: [DOCS] Fixed NDCG formula issues

srowen Mon, 20 Aug 2018 12:59:34 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 ea01e362f -> 9702bb637



[DOCS] Fixed NDCG formula issues

When j is 0, log(j+1) will be 0, and this leads to division by 0 issue.

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, 
manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a pull 
request.

Closes #22090 from yueguoguo/patch-1.

Authored-by: Zhang Le <yueguo...@users.noreply.github.com>
Signed-off-by: Sean Owen <sean.o...@databricks.com>
(cherry picked from commit 219ed7b487c2dfb5007247f77ebf1b3cc73cecb5)
Signed-off-by: Sean Owen <sean.o...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9702bb63
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9702bb63
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9702bb63

Branch: refs/heads/branch-2.3
Commit: 9702bb637d5ac665fefaa96cc69c5f92553f613a
Parents: ea01e36
Author: Zhang Le <yueguo...@users.noreply.github.com>
Authored: Mon Aug 20 14:59:03 2018 -0500
Committer: Sean Owen <sean.o...@databricks.com>
Committed: Mon Aug 20 14:59:21 2018 -0500

----------------------------------------------------------------------
 docs/mllib-evaluation-metrics.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/9702bb63/docs/mllib-evaluation-metrics.md
----------------------------------------------------------------------
diff --git a/docs/mllib-evaluation-metrics.md b/docs/mllib-evaluation-metrics.md
index 7f27754..ac398fb 100644
--- a/docs/mllib-evaluation-metrics.md
+++ b/docs/mllib-evaluation-metrics.md
@@ -462,13 +462,13 @@ $$rel_D(r) = \begin{cases}1 & \text{if $r \in D$}, \\ 0 & 
\text{otherwise}.\end{
       <td>Normalized Discounted Cumulative Gain</td>
       <td>
         $NDCG(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{IDCG(D_i, 
k)}\sum_{j=0}^{n-1}
-          \frac{rel_{D_i}(R_i(j))}{\text{ln}(j+1)}} \\
+          \frac{rel_{D_i}(R_i(j))}{\text{ln}(j+2)}} \\
         \text{Where} \\
         \hspace{5 mm} n = 
\text{min}\left(\text{max}\left(|R_i|,|D_i|\right),k\right) \\
-        \hspace{5 mm} IDCG(D, k) = \sum_{j=0}^{\text{min}(\left|D\right|, k) - 
1} \frac{1}{\text{ln}(j+1)}$
+        \hspace{5 mm} IDCG(D, k) = \sum_{j=0}^{\text{min}(\left|D\right|, k) - 
1} \frac{1}{\text{ln}(j+2)}$
       </td>
       <td>
-        <a 
href="https://en.wikipedia.org/wiki/Information_retrieval#Discounted_cumulative_gain";>NDCG
 at k</a> is a
+        <a 
href="https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG";>NDCG
 at k</a> is a
         measure of how many of the first k recommended documents are in the 
set of true relevant documents averaged
         across all users. In contrast to precision at k, this metric takes 
into account the order of the recommendations
         (documents are assumed to be in order of decreasing relevance).


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [DOCS] Fixed NDCG formula issues

Reply via email to