It sounds fine. At the moment the distributed recommender caps the number of co-occurrences (similarities) recorded at 100 per item. I should make that configurable. And it takes the top 10 user preferences by magnitude when making a recommendation. You might use the same defaults but also make them configurable.
On Tue, Jun 1, 2010 at 6:56 AM, Sebastian Schelter <[email protected]> wrote: > I'm reading this discussion with great interest. > > As you stress the importance of keeping the item-similarity-matrix sparse, I > think it would be a useful improvement to add an option like > "maxSimilaritiesPerItem" to > o.a.m.cf.taste.hadoop.similarity.item.ItemSimilarityJob, which would make it > try to cut down the number of similar items per item. > > However as we store each similarity pair only once it could happen that > there are more than "maxSimilaritiesPerItem" similar items for a single item > as we can't drop some of the pairs because the other item in the pair might > have to little similarities otherwise. > > I could add this feature if you agree that its useful this way. > > If one wishes to drop similarities below a certain cutoff, this could be > done in a custom implementation of > o.a.m.cf.taste.hadoop.similarity.DistributedItemSimilarity by simply > returning NaN if the computed similarity is below that cutoff value. > > -sebastian >
