Hi,
i detected the following behaviour, that seems a bit strange to me:
Be v=(v1, v2,...,vn) and w=(w1, w2, ...,wm) vectors, that are used to
compute the similarity between two items/users. If all vi, that overlap
with w (this means vi!=0 and wi!=0), are equal, and if all wj, that
overlap with v, are equal, no euclidean or pearson similarity can be
computed.

The attached test considers the following vectors: v=(0,2; 0,2; 0,4) and
w=(0,7; 0,7; 0). The overlapping vector components of v are all 0,2. The
overlapping components of w are all 0,7.

The problem is, that "double computeResult(int n, double sumXY, double
sumX2, double sumY2, double sumXYdiff2)" in the corresponding subclass
of AbstractSimilarity is called with parameters sumXY=sumX2=sumY2=0 and
therefore returns Double.NaN. This behaviour contradicts the behaviour
described in the book "Mahout in Action", p.49. The last complete
sentence here is: "Note that we were able compute some notion of
similarity for all pairs of users here, whereas the Pearson correlation
couldn't produce an answer for users 1 and 3." Because of the described
problem, the euclidean algorithm can't produce an answer either. This is
a special case of the described problem, where there is only one overlap.

Regards,
Mattias

-- 
--------------------------------
Mattias Hilliges
Softwareentwicklung
Forschung und Entwicklung

neofonie
Technologieentwicklung und
Informationsmanagement GmbH
Robert-Koch-Platz 4
10115 Berlin
fon: +49.30 24627 100
fax: +49.30 24627 120
mattias.hilli...@neofonie.de
http://www.neofonie.de 

Handelsregister
Berlin-Charlottenburg: HRB 67460

Geschaeftsfuehrung
Helmut Hoffer von Ankershoffen
(Sprecher der Geschaeftsfuehrung)
Nurhan Yildirim
--------------------------------

1,1,0.2
1,2,0.7
2,1,0.2 
2,2,0.7
3,1,0.4
/*
 * (c) neofonie Technologieentwicklung und Informationsmanagement GmbH
 *
 * This computer program is the sole property of neofonie GmbH
 * (http://www.neofonie.de) and is protected under the German Copyright Act
 * (paragraph 69a UrhG). All rights are reserved. Making copies,
 * duplicating, modifying, using or distributing this computer program
 * in any form, without prior written consent of neofonie, is
 * prohibited. Violation of copyright is punishable under the
 * German Copyright Act (paragraph 106 UrhG). Removing this copyright
 * statement is also a violation.
 */
package de.neofonie.recommendation.system.connectors.businesslogic;

import static org.junit.Assert.assertEquals;

import java.io.File;
import java.util.List;

import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.EuclideanDistanceSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.similarity.ItemSimilarity;
import org.junit.Test;

/**
 * @author hilli...@neofonie.de
 */
public class TestAbstractSimilarity {

    /**
     * This test documents a problem with mahout: Be v=(v1, v2,...,vn) and w=(w1, w2, ...,wm) vectors, that
     * are used to compute the similarity between two items/users. If all vi, that overlap with w (this means
     * vi!=0 and wi!=0), are equal, and if all wj, that overlap with v, are equal, no similarity can be
     * computed.<br>
     * In the following test, the following vectors are considered: v=(0,2; 0,2; 0,4) and w=(0,7; 0,7; 0).
     * The overlapping vector components of v are all 0,2. The overlapping components of w are all 0,7.
     */
    @Test
    public void testComponentsEqual() throws Exception {
        DataModel model = new FileDataModel(new File("src/test/resources/abstractSimilarity.csv"));
        ItemSimilarity similarity = new EuclideanDistanceSimilarity(model);
        GenericItemBasedRecommender recommender = new GenericItemBasedRecommender(model, similarity);
        List<RecommendedItem> recommendations = recommender.mostSimilarItems(1, 1);

        assertEquals(1, recommendations.size());
        RecommendedItem firstRecommendation = recommendations.get(0);
        assertEquals(2l, firstRecommendation.getItemID());
    }
}

Reply via email to