> - Does this fail always or just sometimes? Always > - When it finishes, is the result wrong? Just curios, how do you compare 20gb > of text files?;D
Never finishes. > - In case it is really the combiner, does pagerank work without problems? Never finishes if input is large. Sent from my iPad On Sep 29, 2012, at 5:07 AM, "Thomas Jungblut (JIRA)" <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/HAMA-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465866#comment-13465866 > ] > > Thomas Jungblut commented on HAMA-642: > -------------------------------------- > > A race is not good. We have to investigate a bit deeper I guess. I don't > think that there is a concurrency problem inside of jdbm, but I will have a > look, maybe there is some resources that is static, however each task has its > own mutal exclusive "database". so I don't see a problem there. > > My first guess was the use of the combiner. So here my questions: > - Does this fail always or just sometimes? > - When it finishes, is the result wrong? Just curios, how do you compare 20gb > of text files?;D > - In case it is really the combiner, does pagerank work without problems? > > I will build a smaller cluster in near future to test these things more > efficiently. > >> Make GraphRunner disk based >> --------------------------- >> >> Key: HAMA-642 >> URL: https://issues.apache.org/jira/browse/HAMA-642 >> Project: Hama >> Issue Type: Improvement >> Components: graph >> Affects Versions: 0.5.0 >> Reporter: Thomas Jungblut >> Assignee: Edward J. Yoon >> Attachments: HAMA-642_unix_1.patch, HAMA-642_unix_2.patch, >> HAMA-scale_1.patch, HAMA-scale_2.patch, HAMA-scale_3.patch, >> HAMA-scale_4.patch >> >> >> To improve scalability we can improve the graph runner to be disk based. >> Which basically means: >> - We have just a single Vertex instance that get's refilled. >> - We directly write vertices to disk after partitioning >> - In every superstep we iterate over the vertices on disk, fill the vertex >> instance and call the users compute functions >> Problems: >> - State other than vertex value can't be stored easy >> - How do we deal with random access after messages have arrived? >> So I think we should make the graph runner more hybrid, like using the >> queues we have implemented in the messaging. So the graphrunner can be >> configured to run completely on disk, in cached mode or in in-memory mode. > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA administrators > For more information on JIRA, see: http://www.atlassian.com/software/jira
