Re: Using SVD with Canopy/KMeans

Jeff Eastman Tue, 14 Sep 2010 10:45:36 -0700

Here's the new set of mahout svd arguments. Entries --cleansvd,--maxError, --minEigenvalue and --inMemory have been added in r997007.See the new tests in TestDistributedLanczosSolverCLI for examples ofboth forms:


  --input (-i) input                      Path to job input directory.

--output (-o) output The directory pathname foroutput.--numRows (-nr) numRows Number of rows of the inputmatrix--numCols (-nc) numCols Number of columns of theinput matrix--rank (-r) rank Desired decomposition rank(note:only roughly 1/4 to 1/3 ofthese willhave the top portion of thespectrum)

  --symmetric (-sym) symmetric            Is the input matrix square and
                                          symmetric?

--cleansvd (-cl) cleansvd Run the EigenVerificationJobto clean

                                          the eigenvectors after SVD
  --maxError (-err) maxError              Maximum acceptable error

--minEigenvalue (-mev) minEigenvalue Minimum eigenvalue to keepthe vector

for

--inMemory (-mem) inMemory Buffer eigen matrix intomemory (if

                                          you have enough!)
  --help (-h)                             Print out help
  --tempDir tempDir                       Intermediate output directory
  --startPhase startPhase                 First phase to run
  --endPhase endPhase                     Last phase to run


On 9/14/10 6:55 AM, Jake Mannix wrote:

I guess the main thing I'd want to happen in combining EVJ and DLS is to
make sure that the final output (changing the semantics of the CLI param is
ok) is clear, with it either being the output of EVJ (if that is used), or
DLS (if EVJ is not used).  If that can be done, go for it!

   -jake

On Tue, Sep 14, 2010 at 6:30 AM, Jeff Eastman<[email protected]>wrote:

  Jake, I see you are on line. I'm inclined to push forward on this despite
the adjustments to DLS --output semantics. Agreed?


On 9/13/10 10:34 AM, Jeff Eastman wrote:

  r996599 completed the first part. Several additional arguments to EVJ.run
need to be added to DLS (maxError, minEigenValue, inMemory, also the
--cleansvn flag itself). Also DLS interprets --output as the
outputEigenVectorPath and not as the generic output directory so DLS.run()
will need another argument too. Still want to do this?

On 9/12/10 2:19 PM, Jake Mannix wrote:

+1 on folding EigenVerificationJob into DistributedLanczosSolver. Or, at

least implement a job() method on EVJ.

  +1 for having the latter, with a boolean flag in DLS to optionally call

EJV
after it's done.

Re: Using SVD with Canopy/KMeans

Reply via email to