On 10/06/10 17:42, Jake Mannix wrote: > > Oooh, you caught me in an ugly bit of code. The V output of > EigenVerificationJob and DistributedLanczosSolver is yes, just a > SequenceFile<IntWritable,VectorWritable>, where the ints (the keys) are row > numbers (which run from 0 up to reducedRank [well, roughly]). > > S, on the other hand... is hackily encoded in the serialized "name" variable > of the vector output of EigenVerificationJob. If you can think of a better > place to a couple dozen to a couple hundred double values output from a > Hadoop job, well, by all means, submit a patch and I'll tack it in there. > > If you dump the vectors to the screen with the vectordumper command line > script, you'll see the values (but they're also printed to the console when > you run EigenVerificationJob). > > -jake > >
Ah cool! So let me just check that I'm catching this correctly. In EigenVerificationJob.saveCleanEigens, the values of S are encoded with 'meta.getValue()' as the second parameter when the EigenVectors ev are initialised? And does the fourth parameter (s.index) correspond to the vector's row or column in the matrix? Final question ;) Given the sequence file's vector vw, with index i, would the S value encoded in vw correspond to S_i,i? Sorry about all that, but assuming I haven't completely misunderstood, I quite like the way that works.
