[ 
https://issues.apache.org/jira/browse/HAMA-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478564#comment-13478564
 ] 

Edward J. Yoon commented on HAMA-642:
-------------------------------------

I just tested both turnk version and patch applied version again. This patch is 
still unstable.

 - Input size: 1167771205 
 - Tasks per node: 10
 - Physical nodes: 18
 - bsp.child.java.opts: 4GB

Job (patch applied version) always fails without specific error message.

{code}
attempt_201210171757_0001_000047_0, attempt_201210171757_0001_000134_0, 
attempt_201210171757_0001_000062_0, attempt_201210171757_0001_000149_0, 
attempt_201210171757_0001_000046_0, attempt_201210171757_0001_000117_0, 
attempt_201210171757_0001_000041_0, attempt_201210171757_0001_000136_0, 
attempt_201210171757_0001_000140_0, attempt_201210171757_0001_000098_0, 
attempt_201210171757_0001_000103_0, attempt_201210171757_0001_000120_0, 
attempt_201210171757_0001_000036_0, attempt_201210171757_0001_000099_0, 
attempt_201210171757_0001_000045_0, attempt_201210171757_0001_000066_0, 
attempt_201210171757_0001_000160_0]
12/10/17 17:57:15 DEBUG sync.ZooKeeperSyncClientImpl: leaveBarrier(): 
superstep:0 taskid:attempt_201210171757_0001_000080_0 wait for lowest notify.
12/10/17 17:57:15 DEBUG sync.ZooKeeperSyncClientImpl: leaveBarrier() at 
superstep: 0 taskid:attempt_201210171757_0001_000080_0 lowest notify other 
nodes.
12/10/17 17:57:15 DEBUG sync.ZooKeeperSyncClientImpl: leaveBarrier() !!! 
checking znodes contnains /ready node or not: at superstep:0 znode:[ready]
12/10/17 17:57:15 DEBUG sync.ZooKeeperSyncClientImpl: leaveBarrier() at 
superstep:0 znode size: (0) znodes:[]
12/10/17 17:57:15 DEBUG bsp.Counters: Adding TIME_IN_SYNC_MS
12/10/17 17:57:15 DEBUG message.AbstractMessageManager: Creating new class 
org.apache.hama.bsp.message.MemoryQueue
12/10/17 17:57:15 INFO ipc.Server: Stopping server on 61004
12/10/17 17:57:15 INFO ipc.Server: IPC Server handler 0 on 61004: exiting
12/10/17 17:57:15 INFO ipc.Server: Stopping IPC Server listener on 61004
12/10/17 17:57:15 INFO ipc.Server: Stopping IPC Server Responder
12/10/17 17:57:15 ERROR bsp.BSPTask: Shutting down ping service.
{code}
                
> Make GraphRunner disk based
> ---------------------------
>
>                 Key: HAMA-642
>                 URL: https://issues.apache.org/jira/browse/HAMA-642
>             Project: Hama
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>            Assignee: Thomas Jungblut
>         Attachments: HAMA-642_unix_1.patch, HAMA-642_unix_2.patch, 
> HAMA-642_unix_3.patch, HAMA-scale_1.patch, HAMA-scale_2.patch, 
> HAMA-scale_3.patch, HAMA-scale_4.patch
>
>
> To improve scalability we can improve the graph runner to be disk based.
> Which basically means:
> - We have just a single Vertex instance that get's refilled.
> - We directly write vertices to disk after partitioning
> - In every superstep we iterate over the vertices on disk, fill the vertex 
> instance and call the users compute functions
> Problems:
> - State other than vertex value can't be stored easy
> - How do we deal with random access after messages have arrived?
> So I think we should make the graph runner more hybrid, like using the queues 
> we have implemented in the messaging. So the graphrunner can be configured to 
> run completely on disk, in cached mode or in in-memory mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to