[
https://issues.apache.org/jira/browse/HIVEMALL-124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165970#comment-16165970
]
Makoto Yui commented on HIVEMALL-124:
-------------------------------------
Changes has been made in [this
commit|https://github.com/myui/hivemall/blame/v0.5-alpha.1/core/src/main/java/hivemall/evaluation/BinaryResponsesMeasures.java#L45]
for nDCG@k.
[~takuti] I think the current code has a bug and [librec's
one|https://github.com/guoguibing/librec/blob/f49ee52686168a334ce558496ea3fb2fd42701ca/core/src/main/java/net/librec/eval/ranking/NormalizedDCGEvaluator.java#L66]
is correct.
{code:java}
double idcg = IDCG(Math.min(recommendSize, groundTruth.size()));
for (int i = 0, n = recommendSize; i < n; i++) {
Object item_id = rankedList.get(i); // may cause NPE!
..
{code}
should be
{code:java}
final int k = Math.min(rankedList.size(), recommendSize);
for (int i = 0; i < k; i++) {
..
}
double idcg = IDCG(Math.min(groundTruth.size(), k));
{code}
How do you think? (cc: [~uhyonc] )
> NDCG - BinaryResponseMeasure "fix"
> ----------------------------------
>
> Key: HIVEMALL-124
> URL: https://issues.apache.org/jira/browse/HIVEMALL-124
> Project: Hivemall
> Issue Type: Improvement
> Reporter: Uhyon Chung
> Assignee: Takuya Kitazawa
>
> There's a small issue which makes it a bit hard to use the NDCG@x
> from BinaryResponseMeasure.java
> {code:java}
> public static double nDCG(@Nonnull final List<?> rankedList,
> @Nonnull final List<?> groundTruth, @Nonnull final int
> recommendSize) {
> double dcg = 0.d;
> double idcg = IDCG(Math.min(recommendSize, groundTruth.size()));
> ...
> public static double IDCG(final int n) {
> double idcg = 0.d;
> for (int i = 0; i < n; i++) {
> idcg += Math.log(2) / Math.log(i + 2);
> }
> return idcg;
> }
> {code}
> You'll notice that the way it calculates the idcg for binary NDCG calculation
> is that it uses the count in groundTruth. The problem is that when we use
> "recommendSize" (e.g. NDCG@10) we may pass all the ground Truth and not just
> the ones in the first 10. This is a bit unexpected. Of course, we could just
> limit the truths using array intersection and what not, but the users
> shouldn't really have to do that. You can simply just count the # of matched
> ground truths so it's easier to use this function.
> e.g.
> {code:java}
> public static double nDCG(@Nonnull final List<?> rankedList,
> @Nonnull final List<?> groundTruth, @Nonnull final int
> recommendSize) {
> double dcg = 0.d;
> int matchedGroundTruths = 0;
> for (int i = 0, n = recommendSize; i < n; i++) {
> Object item_id = rankedList.get(i);
> if (!groundTruth.contains(item_id)) {
> continue;
> }
> int rank = i + 1;
> dcg += Math.log(2) / Math.log(rank + 1);
> matchedGroundTruths++;
> }
> double idcg = IDCG(matchedGroundTruths);
> return dcg / idcg;
> }
> {code}
> Thanks
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)