[
https://issues.apache.org/jira/browse/HAMA-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479043#comment-13479043
]
Thomas Jungblut commented on HAMA-642:
--------------------------------------
Seems like my datastructure skills aren't that bad at all :P
{noformat}
0% Scenario{vm=java, trial=0, benchmark=Read, length=100000, type=DISKLIST}
24247809,61 ns; σ=18784919,47 ns @ 10 trials
11% Scenario{vm=java, trial=0, benchmark=Read, length=1000000, type=DISKLIST}
172058123,00 ns; σ=85676646,72 ns @ 10 trials
22% Scenario{vm=java, trial=0, benchmark=Read, length=10000000, type=DISKLIST}
1652865166,00 ns; σ=11114290,06 ns @ 3 trials
33% Scenario{vm=java, trial=0, benchmark=Read, length=100000,
type=CACHED_DISKLIST} 15961698,82 ns; σ=49664730,51 ns @ 10 trials
44% Scenario{vm=java, trial=0, benchmark=Read, length=1000000,
type=CACHED_DISKLIST} 79953028,50 ns; σ=40446314,04 ns @ 10 trials
56% Scenario{vm=java, trial=0, benchmark=Read, length=10000000,
type=CACHED_DISKLIST} 758473508,00 ns; σ=12260973,67 ns @ 10 trials
67% Scenario{vm=java, trial=0, benchmark=Read, length=100000,
type=JDBM_LINKEDLIST} 523184277,50 ns; σ=11953102,28 ns @ 10 trials
78% Scenario{vm=java, trial=0, benchmark=Read, length=1000000,
type=JDBM_LINKEDLIST} 4985694940,00 ns; σ=68198383,67 ns @ 10 trials
89% Scenario{vm=java, trial=0, benchmark=Read, length=10000000,
type=JDBM_LINKEDLIST} 51962899063,00 ns; σ=203820253,57 ns @ 3 trials
type length ms linear runtime
DISKLIST 100000 24,2 =
DISKLIST 1000000 172,1 =
DISKLIST 10000000 1652,9 =
CACHED_DISKLIST 100000 16,0 =
CACHED_DISKLIST 1000000 80,0 =
CACHED_DISKLIST 10000000 758,5 =
JDBM_LINKEDLIST 100000 523,2 =
JDBM_LINKEDLIST 1000000 4985,7 ==
JDBM_LINKEDLIST 10000000 51962,9 ==============================
{noformat}
> Make GraphRunner disk based
> ---------------------------
>
> Key: HAMA-642
> URL: https://issues.apache.org/jira/browse/HAMA-642
> Project: Hama
> Issue Type: Improvement
> Components: graph
> Affects Versions: 0.5.0
> Reporter: Thomas Jungblut
> Assignee: Thomas Jungblut
> Attachments: HAMA-642_unix_1.patch, HAMA-642_unix_2.patch,
> HAMA-642_unix_3.patch, HAMA-scale_1.patch, HAMA-scale_2.patch,
> HAMA-scale_3.patch, HAMA-scale_4.patch
>
>
> To improve scalability we can improve the graph runner to be disk based.
> Which basically means:
> - We have just a single Vertex instance that get's refilled.
> - We directly write vertices to disk after partitioning
> - In every superstep we iterate over the vertices on disk, fill the vertex
> instance and call the users compute functions
> Problems:
> - State other than vertex value can't be stored easy
> - How do we deal with random access after messages have arrived?
> So I think we should make the graph runner more hybrid, like using the queues
> we have implemented in the messaging. So the graphrunner can be configured to
> run completely on disk, in cached mode or in in-memory mode.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira