Gram-Schmidt doesn't have to change vectors. You can view it as a way of selecting from an infinite number of vectors in order to get an orthornormal basis. The task of getting an interestingly diverse set of recommendations is a bit different in that we only have a finite number of items to recommend and in that orthonormality isn't really a concept. Another way to look at it as greedy set-cover.
If you view the idealized recommendation as a vector r, then you can select the most colinear item x_0 as the best recommendation. There will be some aspects of r, however, that x_0 does not satisfy. Thus, you can take x_1 to be the item most colinear with (r - x_0). If r and x_i are binary vectors, then subtraction must be almost more like set subtraction, but if it is a reduced dimensional representation like that from LSA, normal subtraction may work with some renormalization. x_2 can then be the item most colinear with (r - x_0 - x_1). You may want to take two or more items at each step before looking for diversity. Another fairly direct method is to just use a threshold where you iterate down the list of items that are similar to r and take elements that are at least a certain minimum dissimilarity relative to all previously selected items. Analogies from linear algebra can be very misleading for this sort of work (this is a borderline case), or very helpful (like with LSA). Usually what you need to do is take the analogy with a big grain of salt and then re-imagine the problem. A good example is the interpretation of LSA, LDA, MDCA and many other techniques as matrix decompositions under different probabilistic assumptions. On Sun, Jun 21, 2009 at 6:01 PM, Sean Owen <[email protected]> wrote: > I am particularly intrigued at the moment by this last question, of > how to pick a sample of very different items. Is the idea here that > you look at items as vectors of preferences, and try to find the > most-orthogonal subset of them? Gram-Schmidt would be changing the > vectors rather than selecting them, so I am curious how these two > things connect. It is a really good problem I think. > -- Ted Dunning, CTO DeepDyve 111 West Evelyn Ave. Ste. 202 Sunnyvale, CA 94086 http://www.deepdyve.com 858-414-0013 (m) 408-773-0220 (fax)
