On 09/06/10 00:32, Jake Mannix wrote:
I've been out of the loop - the cf.hadoop jobs produce user vectors
now?  What is the format?  SequenceFile<LongWritable,VectorWritable>?
Yes exactly so.
Too bad it's not IntWritable,VectorWritable, because then we'd actually
be able to transpose() properly - our Vector implementation assumes int
indices. :\

You can do decomposition on this, because transpose isn't needed.  But
if one of the later steps needs it...


Doesn't the MatrixMultiplicationJob, and therefore decomposition, also assume int? If so is the conversion fairly easy? or would it be a case of writing a separate Input conversion M/R?



On 09/06/10 00:47, Ted Dunning wrote:
It is often best to carry this offset around through the computation to be
applied as late as possible for just this reason.

On Tue, Jun 8, 2010 at 3:47 PM, Jake Mannix<[email protected]>  wrote:

This is for subtracting off the biases?  Be careful not to turn your
user vectors into dense vectors in this step!

@Ted
Yeah the biases would be applied at the last point of the Prediction.

@Jake and Sean
My understanding is that the adding of biases and average rating to the prediction is based on what is done in terms of normalisation before the SVD computation. On that topic could someone clarify the difference between normalization and regularization for me? and also where/if the two interact?



On 09/06/10 00:32, Jake Mannix wrote:
1) A ~=~ B = U S V'
2) A*V = U S V' V = U S
3) A*V*S^-1 = U

That's how you get U from A, V, and S.  Now reconstruct A out of
U, V and S, using line 3) above to substitute in U * S * V' :

U * S * V' = (A * V * S^-1) * S * V' = A * V * V'

Which means you can avoid ever really calculating the full U
matrix at all.

In fact, you can compute A * V * V' in one map-reduce pass
(assuming you can hold V in memory, which might not be
possible, depending on the rank and number of items) :

for(Vector row : A) {
   outputCollector.collect(V.timesSquared(row));
}

Nice to see it can be done with just A and V. Could you talk me through what timeSquared does? I can't quite tell from the src.

Thanks for all the info!

-Richard

Reply via email to