[ https://issues.apache.org/jira/browse/HAMA-642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thomas Jungblut updated HAMA-642: --------------------------------- Affects Version/s: 0.5.0 > Make GraphRunner disk based > --------------------------- > > Key: HAMA-642 > URL: https://issues.apache.org/jira/browse/HAMA-642 > Project: Hama > Issue Type: Improvement > Components: graph > Affects Versions: 0.5.0 > Reporter: Thomas Jungblut > Assignee: Thomas Jungblut > Fix For: 0.6.0 > > Attachments: HAMA-642_unix_1.patch, HAMA-642_unix_2.patch, > HAMA-scale_1.patch, HAMA-scale_2.patch, HAMA-scale_3.patch, HAMA-scale_4.patch > > > To improve scalability we can improve the graph runner to be disk based. > Which basically means: > - We have just a single Vertex instance that get's refilled. > - We directly write vertices to disk after partitioning > - In every superstep we iterate over the vertices on disk, fill the vertex > instance and call the users compute functions > Problems: > - State other than vertex value can't be stored easy > - How do we deal with random access after messages have arrived? > So I think we should make the graph runner more hybrid, like using the queues > we have implemented in the messaging. So the graphrunner can be configured to > run completely on disk, in cached mode or in in-memory mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira