[ 
https://issues.apache.org/jira/browse/HIVEMALL-124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165970#comment-16165970
 ] 

Makoto Yui commented on HIVEMALL-124:
-------------------------------------

Changes has been made in [this 
commit|https://github.com/myui/hivemall/blame/v0.5-alpha.1/core/src/main/java/hivemall/evaluation/BinaryResponsesMeasures.java#L45]
 for nDCG@k.

[~takuti] I think the current code has a bug and [librec's 
one|https://github.com/guoguibing/librec/blob/f49ee52686168a334ce558496ea3fb2fd42701ca/core/src/main/java/net/librec/eval/ranking/NormalizedDCGEvaluator.java#L66]
 is correct. 

{code:java}
double idcg = IDCG(Math.min(recommendSize, groundTruth.size()));
for (int i = 0, n = recommendSize; i < n; i++) {
    Object item_id = rankedList.get(i); // may cause NPE!
    ..
{code}

should be

{code:java}
final int k = Math.min(rankedList.size(), recommendSize);
for (int i = 0; i < k; i++) {
  ..
}

double idcg = IDCG(Math.min(groundTruth.size(), k));
{code}

How do you think? (cc: [~uhyonc] )



> NDCG - BinaryResponseMeasure "fix"
> ----------------------------------
>
>                 Key: HIVEMALL-124
>                 URL: https://issues.apache.org/jira/browse/HIVEMALL-124
>             Project: Hivemall
>          Issue Type: Improvement
>            Reporter: Uhyon Chung
>            Assignee: Takuya Kitazawa
>
> There's a small issue which makes it a bit hard to use the NDCG@x
> from BinaryResponseMeasure.java
> {code:java}
>     public static double nDCG(@Nonnull final List<?> rankedList,
>             @Nonnull final List<?> groundTruth, @Nonnull final int 
> recommendSize) {
>         double dcg = 0.d;
>         double idcg = IDCG(Math.min(recommendSize, groundTruth.size()));
> ...
>     public static double IDCG(final int n) {
>         double idcg = 0.d;
>         for (int i = 0; i < n; i++) {
>             idcg += Math.log(2) / Math.log(i + 2);
>         }
>         return idcg;
>     }
> {code}
> You'll notice that the way it calculates the idcg for binary NDCG calculation 
> is that it uses the count in groundTruth. The problem is that when we use 
> "recommendSize" (e.g. NDCG@10) we may pass all the ground Truth and not just 
> the ones in the first 10. This is a bit unexpected. Of course, we could just 
> limit the truths using array intersection and what not, but the users 
> shouldn't really have to do that. You can simply just count the # of matched 
> ground truths so it's easier to use this function.
> e.g.
> {code:java}
>     public static double nDCG(@Nonnull final List<?> rankedList,
>             @Nonnull final List<?> groundTruth, @Nonnull final int 
> recommendSize) {
>         double dcg = 0.d;
>         int matchedGroundTruths = 0;
>         for (int i = 0, n = recommendSize; i < n; i++) {
>             Object item_id = rankedList.get(i);
>             if (!groundTruth.contains(item_id)) {
>                 continue;
>             }
>             int rank = i + 1;
>             dcg += Math.log(2) / Math.log(rank + 1);
>             matchedGroundTruths++;
>         }
>         double idcg = IDCG(matchedGroundTruths);
>         return dcg / idcg;
>     }
> {code}
> Thanks



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to