[
https://issues.apache.org/jira/browse/MAHOUT-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018467#comment-13018467
]
Jake Mannix commented on MAHOUT-319:
------------------------------------
Ok, so I've actually got a patch which should make this a lot more doable (at
least for Lanczos/DistributedLanczos), by refactoring the solver to take a
LanczosState object which encapsulates the running state (the current basis
vectors, their projections and norms) so that on each iteration, a call to
state.setIterationNumber(i) could have a side-effect of persisting to
disk/hdfs. Additionally, by hiding the state inside of this object, it can
transparently *not* keep everything in memory, which could reduce the overall
memory usage of the solver by a huge margin (at a cost of having to go to disk
for some final non-M/R parts the algorithm). Tests running now, uploading
patch and then doing some cleanup later.
> SVD solvers should be gracefully stoppable/restartable
> ------------------------------------------------------
>
> Key: MAHOUT-319
> URL: https://issues.apache.org/jira/browse/MAHOUT-319
> Project: Mahout
> Issue Type: Improvement
> Components: Math
> Affects Versions: 0.3
> Reporter: Jake Mannix
> Assignee: Jake Mannix
>
> LanczosSolver, DistributedLanczosSolver, and HebbianSolver all keep copious
> amounts of memory-resident data which is lost if the app crashes or is killed
> (OOM, forgetting to run in a screen session, and losing net connectivity to
> the server running it, etc...).
> These algorithms (and many other Mahout processes!) should enable a pluggable
> "persist state" mechanism (to HDFS, RDBMS, local disk, key-value store, etc),
> and similarly, a way to pick up and start from such a state.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira