On Wed, Jun 9, 2010 at 12:47 AM, Jake Mannix <[email protected]> wrote: > I've been out of the loop - the cf.hadoop jobs produce user vectors > now? What is the format? SequenceFile<LongWritable,VectorWritable>?
Yes exactly so. > This is for subtracting off the biases? Be careful not to turn your > user vectors into dense vectors in this step! Yeah is this step even needed? Ted indicated he also didn't exactly see the motivation. Is the intuition that the predictions turn out to have mean 0 so the real average rating has to be added back in? Well that certainly is the implication but I don't see the intuitive reason this is so. (Not that that's required.) > A = U S V' > > you have A, and V. Then just take B = A V V' as your > predictions: essentially you project A down into V-space > (the space of "topics"), the lift back up with V', and you'll > be in the space you started with - user ratings. After retaining only k singular values we really have Uk Sk Vk' whose product is Ak, an approximation of A. And yes that matches my understanding that the "delta" between Ak and A gives you the recommendations - new non-zero values. Did I read right that the process outputs all three of those? So it's just multiplying them together right? I guess I'm not following B = A V V' ... isn't V V' = I so what is that doing?
