On Wed, Jun 9, 2010 at 11:57 AM, Richard Simon Just < [email protected]> wrote:
> > On 09/06/10 00:32, Jake Mannix wrote: > >> I've been out of the loop - the cf.hadoop jobs produce user vectors >>>> now? What is the format? SequenceFile<LongWritable,VectorWritable>? >>>> >>>> >>> Yes exactly so. >>> >>> >> Too bad it's not IntWritable,VectorWritable, because then we'd actually >> be able to transpose() properly - our Vector implementation assumes int >> indices. :\ >> >> You can do decomposition on this, because transpose isn't needed. But >> if one of the later steps needs it... >> >> >> > > Doesn't the MatrixMultiplicationJob, and therefore decomposition, also > assume int? If so is the conversion fairly easy? or would it be a case of > writing a separate Input conversion M/R? MatrixMultiplicationJob assumes int, but SVD does not, as it turns out, because we never actually look at what defines the row-space - only the column space is operated on. It turns out that I think the LanczosDecompositionJob can actually run on any SequenceFile<Writable,VectorWritable>, with any Writable type as key. > On 09/06/10 00:47, Ted Dunning wrote: > @Jake and Sean > My understanding is that the adding of biases and average rating to the > prediction is based on what is done in terms of normalisation before the SVD > computation. On that topic could someone clarify the difference between > normalization and regularization for me? and also where/if the two interact? I'm not sure what kind of regularization we're doing here, actually... > Nice to see it can be done with just A and V. Could you talk me through > what timeSquared does? I can't quite tell from the src. > matrix.timesSquared(Vector v) == (matrix.transpose().times(matrix)).times(v) in one pass over the matrix, without ever computing the transpose or matrix multiplication two matrices (and on a DistributedRowMatrix, it does this in one MapReduce pass over the data). -jake
