[jira] [Commented] (HAMA-642) Make GraphRunner disk based

Thomas Jungblut (JIRA) Sat, 15 Sep 2012 12:05:11 -0700

    [ 
https://issues.apache.org/jira/browse/HAMA-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456474#comment-13456474
 ]


Thomas Jungblut commented on HAMA-642:
--------------------------------------

For example: https://github.com/jankotek/JDBM3

bq.Blazing fast 1 million inserts / 10 million reads per second (on my 5GHz 
machine, but you should get 300000 inserts p.s. easily)

Sounds good, I think we have to teach it Writable serialization. And it is even 
Apache2.
WDYT?
                
> Make GraphRunner disk based
> ---------------------------
>
>                 Key: HAMA-642
>                 URL: https://issues.apache.org/jira/browse/HAMA-642
>             Project: Hama
>          Issue Type: Improvement
>          Components: graph
>            Reporter: Thomas Jungblut
>
> To improve scalability we can improve the graph runner to be disk based.
> Which basically means:
> - We have just a single Vertex instance that get's refilled.
> - We directly write vertices to disk after partitioning
> - In every superstep we iterate over the vertices on disk, fill the vertex 
> instance and call the users compute functions
> Problems:
> - State other than vertex value can't be stored easy
> - How do we deal with random access after messages have arrived?
> So I think we should make the graph runner more hybrid, like using the queues 
> we have implemented in the messaging. So the graphrunner can be configured to 
> run completely on disk, in cached mode or in in-memory mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HAMA-642) Make GraphRunner disk based

Reply via email to