[
https://issues.apache.org/jira/browse/HIVEMALL-124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16073020#comment-16073020
]
Uhyon Chung commented on HIVEMALL-124:
--------------------------------------
Actually this would cause problems when matchedGroundTruths = 0. It might be
better to do something like the following, so we can skip out when idcg=0
NDCGUDF.java line 172
{code:java}
truthList.retainAll(recommendList);
if (truthList.size() == 0) {
return;
}
{code}
> NDCG - BinaryResponseMeasure "fix"
> ----------------------------------
>
> Key: HIVEMALL-124
> URL: https://issues.apache.org/jira/browse/HIVEMALL-124
> Project: Hivemall
> Issue Type: Improvement
> Reporter: Uhyon Chung
> Assignee: Takuya Kitazawa
>
> There's a small issue which makes it a bit hard to use the NDCG@x
> from BinaryResponseMeasure.java
> {code:java}
> public static double nDCG(@Nonnull final List<?> rankedList,
> @Nonnull final List<?> groundTruth, @Nonnull final int
> recommendSize) {
> double dcg = 0.d;
> double idcg = IDCG(Math.min(recommendSize, groundTruth.size()));
> ...
> public static double IDCG(final int n) {
> double idcg = 0.d;
> for (int i = 0; i < n; i++) {
> idcg += Math.log(2) / Math.log(i + 2);
> }
> return idcg;
> }
> {code}
> You'll notice that the way it calculates the idcg for binary NDCG calculation
> is that it uses the count in groundTruth. The problem is that when we use
> "recommendSize" (e.g. NDCG@10) we may pass all the ground Truth and not just
> the ones in the first 10. This is a bit unexpected. Of course, we could just
> limit the truths using array intersection and what not, but the users
> shouldn't really have to do that. You can simply just count the # of matched
> ground truths so it's easier to use this function.
> e.g.
> {code:java}
> public static double nDCG(@Nonnull final List<?> rankedList,
> @Nonnull final List<?> groundTruth, @Nonnull final int
> recommendSize) {
> double dcg = 0.d;
> int matchedGroundTruths = 0;
> for (int i = 0, n = recommendSize; i < n; i++) {
> Object item_id = rankedList.get(i);
> if (!groundTruth.contains(item_id)) {
> continue;
> }
> int rank = i + 1;
> dcg += Math.log(2) / Math.log(rank + 1);
> matchedGroundTruths++;
> }
> double idcg = IDCG(matchedGroundTruths);
> return dcg / idcg;
> }
> {code}
> Thanks
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)