[jira] [Commented] (HAMA-642) Make GraphRunner disk based

Thomas Jungblut (JIRA) Thu, 18 Oct 2012 07:42:08 -0700

    [ 
https://issues.apache.org/jira/browse/HAMA-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479043#comment-13479043
 ]


Thomas Jungblut commented on HAMA-642:
--------------------------------------

Seems like my datastructure skills aren't that bad at all :P

{noformat}
 0% Scenario{vm=java, trial=0, benchmark=Read, length=100000, type=DISKLIST} 
24247809,61 ns; σ=18784919,47 ns @ 10 trials
11% Scenario{vm=java, trial=0, benchmark=Read, length=1000000, type=DISKLIST} 
172058123,00 ns; σ=85676646,72 ns @ 10 trials
22% Scenario{vm=java, trial=0, benchmark=Read, length=10000000, type=DISKLIST} 
1652865166,00 ns; σ=11114290,06 ns @ 3 trials
33% Scenario{vm=java, trial=0, benchmark=Read, length=100000, 
type=CACHED_DISKLIST} 15961698,82 ns; σ=49664730,51 ns @ 10 trials
44% Scenario{vm=java, trial=0, benchmark=Read, length=1000000, 
type=CACHED_DISKLIST} 79953028,50 ns; σ=40446314,04 ns @ 10 trials
56% Scenario{vm=java, trial=0, benchmark=Read, length=10000000, 
type=CACHED_DISKLIST} 758473508,00 ns; σ=12260973,67 ns @ 10 trials
67% Scenario{vm=java, trial=0, benchmark=Read, length=100000, 
type=JDBM_LINKEDLIST} 523184277,50 ns; σ=11953102,28 ns @ 10 trials
78% Scenario{vm=java, trial=0, benchmark=Read, length=1000000, 
type=JDBM_LINKEDLIST} 4985694940,00 ns; σ=68198383,67 ns @ 10 trials
89% Scenario{vm=java, trial=0, benchmark=Read, length=10000000, 
type=JDBM_LINKEDLIST} 51962899063,00 ns; σ=203820253,57 ns @ 3 trials

           type   length      ms linear runtime
       DISKLIST   100000    24,2 =
       DISKLIST  1000000   172,1 =
       DISKLIST 10000000  1652,9 =
CACHED_DISKLIST   100000    16,0 =
CACHED_DISKLIST  1000000    80,0 =
CACHED_DISKLIST 10000000   758,5 =
JDBM_LINKEDLIST   100000   523,2 =
JDBM_LINKEDLIST  1000000  4985,7 ==
JDBM_LINKEDLIST 10000000 51962,9 ==============================
{noformat}

                
> Make GraphRunner disk based
> ---------------------------
>
>                 Key: HAMA-642
>                 URL: https://issues.apache.org/jira/browse/HAMA-642
>             Project: Hama
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>            Assignee: Thomas Jungblut
>         Attachments: HAMA-642_unix_1.patch, HAMA-642_unix_2.patch, 
> HAMA-642_unix_3.patch, HAMA-scale_1.patch, HAMA-scale_2.patch, 
> HAMA-scale_3.patch, HAMA-scale_4.patch
>
>
> To improve scalability we can improve the graph runner to be disk based.
> Which basically means:
> - We have just a single Vertex instance that get's refilled.
> - We directly write vertices to disk after partitioning
> - In every superstep we iterate over the vertices on disk, fill the vertex 
> instance and call the users compute functions
> Problems:
> - State other than vertex value can't be stored easy
> - How do we deal with random access after messages have arrived?
> So I think we should make the graph runner more hybrid, like using the queues 
> we have implemented in the messaging. So the graphrunner can be configured to 
> run completely on disk, in cached mode or in in-memory mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HAMA-642) Make GraphRunner disk based

Reply via email to