[
https://issues.apache.org/jira/browse/MAHOUT-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13599199#comment-13599199
]
Dmitriy Lyubimov commented on MAHOUT-1159:
------------------------------------------
+ SSVDSolver solveIt = new SSVDSolver(
+ depConf,
+ LPath,
+ SSVDout,
+ 1000, // Vertical height of a q-block
+ clusters,
+ 15, // Oversampling
+ 10);
Vertical block is both hardcoded and too small. Default in CLI sets it at 30,000
Oversampling is hardcoded.
reducers are hardcoded.
I feel a better practice would be to expose these need to be exposed as
additional (optional) parameters in Spectral K-Means. Use defaults form
SSVDCli.
You do need to pass reducers to the solver as solver manages a bunch of jobs
some of them are map-only (reducers must be 0) and some of them are small scale
reduction with reducer 1 (i am not sure about this one though). Anyway. This is
a mandatory parameter as in far too many cases people did not specify it and it
defaulted to 1 (no parallelization). So it is forced thru but you need to
expose it thru optional overrides.
Think of "mkfs" in Linux. It does have a way to pass custom parameters to a
particular fs formatter (such as mkfs.ext2). Same here.
+ // May want to update SSVD documentation on this one:
method doc
+ // says "false" is the default, yet it's set to true in
the
+ // variable definition.
+ //solveIt.setBroadcast(false);
That may be true. it did go back and forth as far as i can recollect but i was
under impression i fixed that. I'll check again. Surprisingly, it did not make
much difference in my tests; actually the versions that used distributed cache,
performed a tiny bit worse for some reason.
> Add SSVD option to SpectralKMeans
> ---------------------------------
>
> Key: MAHOUT-1159
> URL: https://issues.apache.org/jira/browse/MAHOUT-1159
> Project: Mahout
> Issue Type: Improvement
> Components: Clustering
> Affects Versions: 0.8
> Reporter: Shannon Quinn
> Assignee: Shannon Quinn
> Priority: Minor
> Fix For: 0.8
>
> Attachments: MAHOUT-1159.patch
>
>
> This adds SSVD as an option for eigensolver, in addition to the [default]
> Lanczos solver. Testing indicated it yielded similar clustering accuracy with
> a possible performance boost.
> This patch includes other small fixes, such as using the default "tempDir"
> for intermediate calculations.
> The initialization of the SSVD solver is a bit awkward, with specifying the
> number of reducers. I hard-coded this at 10; is there a better solution?
> Perhaps making it an optional parameter to the SSVD constructor?
> [Thanks to University of Pittsburgh CS undergraduates Andrew King, Pawan
> Solanki, and Philip Schinis for working on this.]
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira