[
https://issues.apache.org/jira/browse/MAHOUT-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021641#comment-13021641
]
Jake Mannix commented on MAHOUT-319:
------------------------------------
Hi Saikat,
Modifying the code in this patch to specifically use a LocalFileSystem
instance would be helpful, most likely, yes.
Something which would be very helpful would be to improve the efficiency of
the HDFS persistence, actually. My current design is super-dumb: after each
Lanczos iteration, persist the new vector directly to HDFS (which is fine), but
then when iterating through the basis vector (to construct the final singular
vectors) just read read them raw back from HDFS again. Probably a better
approach could be devised, which temporarily stores them locally.
Can you try out this patch, and see how it looks, and whether you have any
ideas to improve the RAM/time tradeoff and local/remote fs tradeoff?
I will probably end up committing this to trunk soon enough (this week), and
if you have ideas to improve it from there, it would be great.
> SVD solvers should be gracefully stoppable/restartable
> ------------------------------------------------------
>
> Key: MAHOUT-319
> URL: https://issues.apache.org/jira/browse/MAHOUT-319
> Project: Mahout
> Issue Type: Improvement
> Components: Math
> Affects Versions: 0.3
> Reporter: Jake Mannix
> Assignee: Jake Mannix
> Fix For: 0.5
>
> Attachments: MAHOUT-319.diff, MAHOUT-319.patch
>
>
> LanczosSolver, DistributedLanczosSolver, and HebbianSolver all keep copious
> amounts of memory-resident data which is lost if the app crashes or is killed
> (OOM, forgetting to run in a screen session, and losing net connectivity to
> the server running it, etc...).
> These algorithms (and many other Mahout processes!) should enable a pluggable
> "persist state" mechanism (to HDFS, RDBMS, local disk, key-value store, etc),
> and similarly, a way to pick up and start from such a state.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira