[
https://issues.apache.org/jira/browse/HAMA-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465866#comment-13465866
]
Thomas Jungblut commented on HAMA-642:
--------------------------------------
A race is not good. We have to investigate a bit deeper I guess. I don't think
that there is a concurrency problem inside of jdbm, but I will have a look,
maybe there is some resources that is static, however each task has its own
mutal exclusive "database". so I don't see a problem there.
My first guess was the use of the combiner. So here my questions:
- Does this fail always or just sometimes?
- When it finishes, is the result wrong? Just curios, how do you compare 20gb
of text files?;D
- In case it is really the combiner, does pagerank work without problems?
I will build a smaller cluster in near future to test these things more
efficiently.
> Make GraphRunner disk based
> ---------------------------
>
> Key: HAMA-642
> URL: https://issues.apache.org/jira/browse/HAMA-642
> Project: Hama
> Issue Type: Improvement
> Components: graph
> Affects Versions: 0.5.0
> Reporter: Thomas Jungblut
> Assignee: Edward J. Yoon
> Attachments: HAMA-642_unix_1.patch, HAMA-642_unix_2.patch,
> HAMA-scale_1.patch, HAMA-scale_2.patch, HAMA-scale_3.patch, HAMA-scale_4.patch
>
>
> To improve scalability we can improve the graph runner to be disk based.
> Which basically means:
> - We have just a single Vertex instance that get's refilled.
> - We directly write vertices to disk after partitioning
> - In every superstep we iterate over the vertices on disk, fill the vertex
> instance and call the users compute functions
> Problems:
> - State other than vertex value can't be stored easy
> - How do we deal with random access after messages have arrived?
> So I think we should make the graph runner more hybrid, like using the queues
> we have implemented in the messaging. So the graphrunner can be configured to
> run completely on disk, in cached mode or in in-memory mode.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira