It sounds fine. At the moment the distributed recommender caps the
number of co-occurrences (similarities) recorded at 100 per item. I
should make that configurable. And it takes the top 10 user
preferences by magnitude when making a recommendation. You might use
the same defaults but also make them configurable.

On Tue, Jun 1, 2010 at 6:56 AM, Sebastian Schelter
<[email protected]> wrote:
> I'm reading this discussion with great interest.
>
> As you stress the importance of keeping the item-similarity-matrix sparse, I
> think it would be a useful improvement to add an option like
> "maxSimilaritiesPerItem" to
> o.a.m.cf.taste.hadoop.similarity.item.ItemSimilarityJob, which would make it
> try to cut down the number of similar items per item.
>
> However as we store each similarity pair only once it could happen that
> there are more than "maxSimilaritiesPerItem" similar items for a single item
> as we can't drop some of the pairs because the other item in the pair might
> have to little similarities otherwise.
>
> I could add this feature if you agree that its useful this way.
>
> If one wishes to drop similarities below a certain cutoff, this could be
> done in a custom implementation of
> o.a.m.cf.taste.hadoop.similarity.DistributedItemSimilarity by simply
> returning NaN if the computed similarity is below that cutoff value.
>
> -sebastian
>

Reply via email to