Repository: spark Updated Branches: refs/heads/branch-2.3 ea01e362f -> 9702bb637
[DOCS] Fixed NDCG formula issues When j is 0, log(j+1) will be 0, and this leads to division by 0 issue. ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #22090 from yueguoguo/patch-1. Authored-by: Zhang Le <yueguo...@users.noreply.github.com> Signed-off-by: Sean Owen <sean.o...@databricks.com> (cherry picked from commit 219ed7b487c2dfb5007247f77ebf1b3cc73cecb5) Signed-off-by: Sean Owen <sean.o...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9702bb63 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9702bb63 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9702bb63 Branch: refs/heads/branch-2.3 Commit: 9702bb637d5ac665fefaa96cc69c5f92553f613a Parents: ea01e36 Author: Zhang Le <yueguo...@users.noreply.github.com> Authored: Mon Aug 20 14:59:03 2018 -0500 Committer: Sean Owen <sean.o...@databricks.com> Committed: Mon Aug 20 14:59:21 2018 -0500 ---------------------------------------------------------------------- docs/mllib-evaluation-metrics.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/9702bb63/docs/mllib-evaluation-metrics.md ---------------------------------------------------------------------- diff --git a/docs/mllib-evaluation-metrics.md b/docs/mllib-evaluation-metrics.md index 7f27754..ac398fb 100644 --- a/docs/mllib-evaluation-metrics.md +++ b/docs/mllib-evaluation-metrics.md @@ -462,13 +462,13 @@ $$rel_D(r) = \begin{cases}1 & \text{if $r \in D$}, \\ 0 & \text{otherwise}.\end{ <td>Normalized Discounted Cumulative Gain</td> <td> $NDCG(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{IDCG(D_i, k)}\sum_{j=0}^{n-1} - \frac{rel_{D_i}(R_i(j))}{\text{ln}(j+1)}} \\ + \frac{rel_{D_i}(R_i(j))}{\text{ln}(j+2)}} \\ \text{Where} \\ \hspace{5 mm} n = \text{min}\left(\text{max}\left(|R_i|,|D_i|\right),k\right) \\ - \hspace{5 mm} IDCG(D, k) = \sum_{j=0}^{\text{min}(\left|D\right|, k) - 1} \frac{1}{\text{ln}(j+1)}$ + \hspace{5 mm} IDCG(D, k) = \sum_{j=0}^{\text{min}(\left|D\right|, k) - 1} \frac{1}{\text{ln}(j+2)}$ </td> <td> - <a href="https://en.wikipedia.org/wiki/Information_retrieval#Discounted_cumulative_gain">NDCG at k</a> is a + <a href="https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG">NDCG at k</a> is a measure of how many of the first k recommended documents are in the set of true relevant documents averaged across all users. In contrast to precision at k, this metric takes into account the order of the recommendations (documents are assumed to be in order of decreasing relevance). --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org