Re: GSoC Update

Jake Mannix Wed, 09 Jun 2010 12:21:33 -0700

On Wed, Jun 9, 2010 at 11:57 AM, Richard Simon Just <
[email protected]> wrote:

>
> On 09/06/10 00:32, Jake Mannix wrote:
>
>> I've been out of the loop - the cf.hadoop jobs produce user vectors
>>>> now?  What is the format?  SequenceFile<LongWritable,VectorWritable>?
>>>>
>>>>
>>> Yes exactly so.
>>>
>>>
>> Too bad it's not IntWritable,VectorWritable, because then we'd actually
>> be able to transpose() properly - our Vector implementation assumes int
>> indices. :\
>>
>> You can do decomposition on this, because transpose isn't needed.  But
>> if one of the later steps needs it...
>>
>>
>>
>
> Doesn't the MatrixMultiplicationJob, and therefore decomposition, also
> assume int? If so is the conversion fairly easy? or would it be a case of
> writing a separate Input conversion M/R?

MatrixMultiplicationJob assumes int, but SVD does not, as it turns out,
because we never actually look at what defines the row-space - only the
column space is operated on. It turns out that I think the
LanczosDecompositionJob can actually run on any
SequenceFile<Writable,VectorWritable>, with any Writable type as key.

> On 09/06/10 00:47, Ted Dunning wrote:
> @Jake and Sean
> My understanding is that the adding of biases and average rating to the
> prediction is based on what is done in terms of normalisation before the SVD
> computation. On that topic could someone clarify the difference between
> normalization and regularization for me? and also where/if the two interact?

I'm not sure what kind of regularization we're doing here, actually...

> Nice to see it can be done with just A and V. Could you talk me through
> what timeSquared does? I can't quite tell from the src.
>

matrix.timesSquared(Vector v) ==
(matrix.transpose().times(matrix)).times(v)

in one pass over the matrix, without ever computing the transpose or matrix
multiplication two matrices (and on a DistributedRowMatrix, it does this in
one MapReduce pass over the data).

  -jake

Re: GSoC Update

Reply via email to