Hi,
I ran into some issues with GenericItembasedRecommender this week, which
I could only work around by creating a custom ItembasedRecommender
implementation. I think the issues might be worth discussing here and
I'd look forward to committing back my changes if we find them useful.
The first issue is with
GenericItembasedRecommender.MultiMostSimilarEstimator, which is used to
compute the most similar items to a collection of items. The current
implementation filters out all items that are not similar (having NaN as
similarity value) to at least one of the input items. While this might
be algorithmically correct it very often leads to empty results. Users
might e.g. put very different things in a shopping cart and using those
things as input for mostSimilarItems produces empty results in lots of
cases in my experience. My workaround was to interpret NaN as 0 when
computing the average estimate here (and in the end filtering out
results that had 0 as average), thus allowing an item to be included in
the result if it is similar to at least one of the input items. If we
decide to include this we could either introduce a second
mostSimilarItems method or make it receive a parameter to determine the
"exclusion mode" or whatever we might call it.
The second issue is a little bit more complicated. A while ago we
introduced an component called CandidateItemsStrategy to enable the
customization of the selection of the initial candidate items that might
be recommended to a user. I noticed that we actually should do the same
thing with the selection of candidate items for mostSimilarItems, which
is currently done in
GenericItembasedRecommender.doMostSimilarItems(...). This especially
wastes CPU time when we use precomputed similarities
(GenericItemSimilarity or FileItemSimilarity) because we already "know"
the possibly similar items. Unfortunately there's no way to ask
ItemSimilarity to directly give you all similar items to an item (which
would be very the most efficient way of use when dealing with already
precomputed similarities). I created a small file-based indexing
component which can be asked for those but I'm not to happy with
spreading the information about the precomputed similarities. Though I
think we should work on improving the efficiency here as it turned out
to be a performance killer in my usecase.
I hope I can make it clear what the problems were (and what solutions I
propose). I could also supply a patch in the next weeks but I wanted to
have a discussion first.
--sebastian
- feature requests regarding GenericItembasedRecommender Sebastian Schelter
-