Here's the new set of mahout svd arguments. Entries --cleansvd, --maxError, --minEigenvalue and --inMemory have been added in r997007. See the new tests in TestDistributedLanczosSolverCLI for examples of both forms:

  --input (-i) input                      Path to job input directory.
--output (-o) output The directory pathname for output. --numRows (-nr) numRows Number of rows of the input matrix --numCols (-nc) numCols Number of columns of the input matrix --rank (-r) rank Desired decomposition rank (note: only roughly 1/4 to 1/3 of these will have the top portion of the spectrum)
  --symmetric (-sym) symmetric            Is the input matrix square and
                                          symmetric?
--cleansvd (-cl) cleansvd Run the EigenVerificationJob to clean
                                          the eigenvectors after SVD
  --maxError (-err) maxError              Maximum acceptable error
--minEigenvalue (-mev) minEigenvalue Minimum eigenvalue to keep the vector
                                          for
--inMemory (-mem) inMemory Buffer eigen matrix into memory (if
                                          you have enough!)
  --help (-h)                             Print out help
  --tempDir tempDir                       Intermediate output directory
  --startPhase startPhase                 First phase to run
  --endPhase endPhase                     Last phase to run

On 9/14/10 6:55 AM, Jake Mannix wrote:
I guess the main thing I'd want to happen in combining EVJ and DLS is to
make sure that the final output (changing the semantics of the CLI param is
ok) is clear, with it either being the output of EVJ (if that is used), or
DLS (if EVJ is not used).  If that can be done, go for it!

   -jake

On Tue, Sep 14, 2010 at 6:30 AM, Jeff Eastman<[email protected]>wrote:

  Jake, I see you are on line. I'm inclined to push forward on this despite
the adjustments to DLS --output semantics. Agreed?


On 9/13/10 10:34 AM, Jeff Eastman wrote:

  r996599 completed the first part. Several additional arguments to EVJ.run
need to be added to DLS (maxError, minEigenValue, inMemory, also the
--cleansvn flag itself). Also DLS interprets --output as the
outputEigenVectorPath and not as the generic output directory so DLS.run()
will need another argument too. Still want to do this?

On 9/12/10 2:19 PM, Jake Mannix wrote:

+1 on folding EigenVerificationJob into DistributedLanczosSolver. Or, at
least implement a job() method on EVJ.

  +1 for having the latter, with a boolean flag in DLS to optionally call
EJV
after it's done.



Reply via email to