[ 
https://issues.apache.org/jira/browse/MAHOUT-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13599199#comment-13599199
 ] 

Dmitriy Lyubimov commented on MAHOUT-1159:
------------------------------------------

 +                      SSVDSolver solveIt = new SSVDSolver(
+                                       depConf, 
+                                       LPath, 
+                                       SSVDout, 
+                                       1000, // Vertical height of a q-block
+                                       clusters, 
+                                       15, // Oversampling 
+                                       10);


Vertical block is both hardcoded and too small. Default in CLI sets it at 30,000
Oversampling is hardcoded. 
reducers are hardcoded.

I feel a better practice would be to expose these need to be exposed as 
additional (optional) parameters in Spectral K-Means. Use defaults form 
SSVDCli. 

You do need to pass reducers to the solver as solver manages a bunch of jobs 
some of them are map-only (reducers must be 0) and some of them are small scale 
reduction with reducer 1 (i am not sure about this one though). Anyway. This is 
a mandatory parameter as in far too many cases people did not specify it and it 
defaulted to 1 (no parallelization). So it is forced thru but you need to 
expose it thru optional overrides.

Think of "mkfs" in Linux. It does have a way to pass custom parameters to a 
particular fs formatter (such as mkfs.ext2). Same here.





+                       // May want to update SSVD documentation on this one: 
method doc
+                       // says "false" is the default, yet it's set to true in 
the 
+                       // variable definition.
+                       //solveIt.setBroadcast(false);

That may be true. it did go back and forth as far as i can recollect but i was 
under impression i fixed that. I'll check again. Surprisingly, it did not make 
much difference in my tests; actually the versions that used distributed cache, 
performed a tiny bit worse for some reason.

                
> Add SSVD option to SpectralKMeans
> ---------------------------------
>
>                 Key: MAHOUT-1159
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1159
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.8
>            Reporter: Shannon Quinn
>            Assignee: Shannon Quinn
>            Priority: Minor
>             Fix For: 0.8
>
>         Attachments: MAHOUT-1159.patch
>
>
> This adds SSVD as an option for eigensolver, in addition to the [default] 
> Lanczos solver. Testing indicated it yielded similar clustering accuracy with 
> a possible performance boost.
> This patch includes other small fixes, such as using the default "tempDir" 
> for intermediate calculations.
> The initialization of the SSVD solver is a bit awkward, with specifying the 
> number of reducers. I hard-coded this at 10; is there a better solution? 
> Perhaps making it an optional parameter to the SSVD constructor?
> [Thanks to University of Pittsburgh CS undergraduates Andrew King, Pawan 
> Solanki, and Philip Schinis for working on this.]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to