Here's the new set of mahout svd arguments. Entries --cleansvd,
--maxError, --minEigenvalue and --inMemory have been added in r997007.
See the new tests in TestDistributedLanczosSolverCLI for examples of
both forms:
--input (-i) input Path to job input directory.
--output (-o) output The directory pathname for
output.
--numRows (-nr) numRows Number of rows of the input
matrix
--numCols (-nc) numCols Number of columns of the
input matrix
--rank (-r) rank Desired decomposition rank
(note:
only roughly 1/4 to 1/3 of
these will
have the top portion of the
spectrum)
--symmetric (-sym) symmetric Is the input matrix square and
symmetric?
--cleansvd (-cl) cleansvd Run the EigenVerificationJob
to clean
the eigenvectors after SVD
--maxError (-err) maxError Maximum acceptable error
--minEigenvalue (-mev) minEigenvalue Minimum eigenvalue to keep
the vector
for
--inMemory (-mem) inMemory Buffer eigen matrix into
memory (if
you have enough!)
--help (-h) Print out help
--tempDir tempDir Intermediate output directory
--startPhase startPhase First phase to run
--endPhase endPhase Last phase to run
On 9/14/10 6:55 AM, Jake Mannix wrote:
I guess the main thing I'd want to happen in combining EVJ and DLS is to
make sure that the final output (changing the semantics of the CLI param is
ok) is clear, with it either being the output of EVJ (if that is used), or
DLS (if EVJ is not used). If that can be done, go for it!
-jake
On Tue, Sep 14, 2010 at 6:30 AM, Jeff Eastman<[email protected]>wrote:
Jake, I see you are on line. I'm inclined to push forward on this despite
the adjustments to DLS --output semantics. Agreed?
On 9/13/10 10:34 AM, Jeff Eastman wrote:
r996599 completed the first part. Several additional arguments to EVJ.run
need to be added to DLS (maxError, minEigenValue, inMemory, also the
--cleansvn flag itself). Also DLS interprets --output as the
outputEigenVectorPath and not as the generic output directory so DLS.run()
will need another argument too. Still want to do this?
On 9/12/10 2:19 PM, Jake Mannix wrote:
+1 on folding EigenVerificationJob into DistributedLanczosSolver. Or, at
least implement a job() method on EVJ.
+1 for having the latter, with a boolean flag in DLS to optionally call
EJV
after it's done.