[ 
https://issues.apache.org/jira/browse/MAHOUT-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018467#comment-13018467
 ] 

Jake Mannix commented on MAHOUT-319:
------------------------------------

Ok, so I've actually got a patch which should make this a lot more doable (at 
least for Lanczos/DistributedLanczos), by refactoring the solver to take a 
LanczosState object which encapsulates the running state (the current basis 
vectors, their projections and norms) so that on each iteration, a call to 
state.setIterationNumber(i) could have a side-effect of persisting to 
disk/hdfs.  Additionally, by hiding the state inside of this object, it can 
transparently *not* keep everything in memory, which could reduce the overall 
memory usage of the solver by a huge margin (at a cost of having to go to disk 
for some final non-M/R parts the algorithm).  Tests running now, uploading 
patch and then doing some cleanup later.

> SVD solvers should be gracefully stoppable/restartable
> ------------------------------------------------------
>
>                 Key: MAHOUT-319
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-319
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.3
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>
> LanczosSolver, DistributedLanczosSolver, and HebbianSolver all keep copious 
> amounts of memory-resident data which is lost if the app crashes or is killed 
> (OOM, forgetting to run in a screen session, and losing net connectivity to 
> the server running it, etc...).  
> These algorithms (and many other Mahout processes!) should enable a pluggable 
> "persist state" mechanism (to HDFS, RDBMS, local disk, key-value store, etc), 
> and similarly, a way to pick up and start from such a state.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to