[jira] [Commented] (HAMA-642) Make GraphRunner disk based

Thomas Jungblut (JIRA) Fri, 19 Oct 2012 10:18:22 -0700

    [ 
https://issues.apache.org/jira/browse/HAMA-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480144#comment-13480144
 ]


Thomas Jungblut commented on HAMA-642:
--------------------------------------

Yet another benchmark:

{noformat}
// writing 1gb random, 512k buffer size

Written 1024mb in 24612ms! That is 41,61mb/s!
Read 1024mb in 8701ms! That is 117,69mb/s!
1/4
Written 1024mb in 6625ms! That is 154,57mb/s!
Read 1024mb in 9424ms! That is 108,66mb/s!
2/4
Written 1024mb in 6674ms! That is 153,43mb/s!
Read 1024mb in 9479ms! That is 108,03mb/s!
3/4
Written 1024mb in 6775ms! That is 151,14mb/s!
Read 1024mb in 9294ms! That is 110,18mb/s!
4/4

//512mb random, 512k buffer size

Written 512mb in 12325ms! That is 41,54mb/s!
Read 512mb in 6758ms! That is 75,76mb/s!
1/4
Written 512mb in 3346ms! That is 153,02mb/s!
Read 512mb in 4521ms! That is 113,25mb/s!
2/4
Written 512mb in 3287ms! That is 155,77mb/s!
Read 512mb in 4538ms! That is 112,83mb/s!
3/4
Written 512mb in 3293ms! That is 155,48mb/s!
Read 512mb in 4522ms! That is 113,22mb/s!
4/4

{noformat}

You see that in the first iterations, JIT is warming up. In the last 2 
iterations I see a very nice disk saturation. It is only 5mb/s slower than a C 
program. I'm really happy with the performance now. 
                
> Make GraphRunner disk based
> ---------------------------
>
>                 Key: HAMA-642
>                 URL: https://issues.apache.org/jira/browse/HAMA-642
>             Project: Hama
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>            Assignee: Thomas Jungblut
>         Attachments: HAMA-642_unix_1.patch, HAMA-642_unix_2.patch, 
> HAMA-642_unix_3.patch, HAMA-642_unix_4.patch, HAMA-scale_1.patch, 
> HAMA-scale_2.patch, HAMA-scale_3.patch, HAMA-scale_4.patch
>
>
> To improve scalability we can improve the graph runner to be disk based.
> Which basically means:
> - We have just a single Vertex instance that get's refilled.
> - We directly write vertices to disk after partitioning
> - In every superstep we iterate over the vertices on disk, fill the vertex 
> instance and call the users compute functions
> Problems:
> - State other than vertex value can't be stored easy
> - How do we deal with random access after messages have arrived?
> So I think we should make the graph runner more hybrid, like using the queues 
> we have implemented in the messaging. So the graphrunner can be configured to 
> run completely on disk, in cached mode or in in-memory mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HAMA-642) Make GraphRunner disk based

Reply via email to