You estimate a preference for each of those items, yes, in either user-based or item-based recommendation. In item-based recommendation, the estimate is a weighted average -- it's the user's preferences for various items, weighted by their similarity to the given item.
In that case you don't need a neighborhood. The items of interest are the user's preferred items -- and you want to use all of them, not a subset. It's not quite symmetrical with user-based recommendation, which is based on user similarity. There, you need to constrain yourself to examine only a subset of all users, a neighborhood, or else it would be wildly inefficient. But in item-based recommendation you don't have this issue. *Given an item*, you already know the very small number of items it needs to be compared to -- the user's preferred items. That takes the place of a neighborhood in a sense. You could say, well, then the problem is elsewhere: how can considering all possible items for recommendation be efficient? if we use neighborhoods to get around that in user-based, why not item-based? In fact the algorithm doesn't actually look at every item -- it constructs a set of items that are at all connected to any item the user prefers, in order to rule out most items that can't possibly be recommended. In that sense a 'neighborhood' comes into play: the set of all items considered is really the union of all maximal neighborhoods around any item that the user prefers. That's a big neighborhood, and if this is what you mean, you are correct that you could reasonably add parameters to constrain that neighborhood. The reasons maybe you don't want to do that are: 1) Item similarity is often 'fast' in that it is sometimes precomputed based on outside information. So sorting through a lot of potential items doesn't hurt much. 2) It's not part of the canonical item-based algorithm, but that's not a great reason. 3) Computing this neighborhood gets expensive: it must be defined based on distance to all items in the set, not one. That is, being far from or near to one item doesn't mean anything by itself. It matters how close it is to the whole set. By the time you're computing that... might as well just use the canonical algorithm. On Sat, Feb 20, 2010 at 11:22 AM, jamborta <[email protected]> wrote: > > but as far as I understand your implementation you take user1 and then get > all the items > that the user hasn't rated (getAllOtherItems()) and generate recommendation > for each of these items. therefore, you have user1 item1, user1 item2, etc > as input. so the neighbourhood can be restricted for each of these items. > > Tamas
