[
https://issues.apache.org/jira/browse/HAMA-454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128743#comment-13128743
]
Thomas Jungblut commented on HAMA-454:
--------------------------------------
Boom.
I face lock issues with two tasks:
Task 1
{noformat}
11/10/17 11:15:00 INFO zookeeper.ClientCnxn: Session establishment complete on
server localhost/127.0.0.1:21810, sessionid = 0x13311060315000d, negotiated
timeout = 1200000
11/10/17 11:15:00 INFO bsp.YarnSerializePrinting$HelloBSP:
[Ljava.lang.String;@42787d6a
11/10/17 11:15:00 INFO bsp.YarnSerializePrinting$HelloBSP: Hello BSP from 1 of
2: localhost.localdomain:16002
11/10/17 11:15:01 INFO bsp.BSPPeerImpl: xxxx 1. At superstep: 0 which task is
waiting? attempt_appattempt_1318835555330_0018_000001_0000_000001_1 stat is
null? null
11/10/17 11:15:02 INFO bsp.BSPPeerImpl: leaveBarrier() !!! checking znodes
contnains /ready node or not: at superstep:0
znode:[attempt_appattempt_1318835555330_0018_000001_0000_000001_1,
attempt_appattempt_1318835555330_0018_000001_0000_000000_0, ready]
11/10/17 11:15:02 INFO bsp.BSPPeerImpl: leaveBarrier() at superstep:0 znode
size: (2) znodes:[attempt_appattempt_1318835555330_0018_000001_0000_000001_1,
attempt_appattempt_1318835555330_0018_000001_0000_000000_0]
11/10/17 11:15:02 INFO bsp.BSPPeerImpl: leaveBarrier() at superstep: 0
taskid:attempt_appattempt_1318835555330_0018_000001_0000_000001_1 lowest:
attempt_appattempt_1318835555330_0018_000001_0000_000000_0
highest:attempt_appattempt_1318835555330_0018_000001_0000_000001_1
11/10/17 11:15:02 INFO bsp.BSPPeerImpl: leaveBarrier() znode at superstep:0
taskid:attempt_appattempt_1318835555330_0018_000001_0000_000001_1 exists, so
delete it.
11/10/17 11:15:02 INFO bsp.BSPPeerImpl: leaveBarrier() !!! checking znodes
contnains /ready node or not: at superstep:0 znode:[ready]
11/10/17 11:15:02 INFO bsp.BSPPeerImpl: leaveBarrier() at superstep:0 znode
size: (0) znodes:[]
{noformat}
Task 2
{noformat}
11/10/17 11:15:00 INFO bsp.YarnSerializePrinting$HelloBSP:
[Ljava.lang.String;@df4cbee
11/10/17 11:15:02 INFO bsp.BSPPeerImpl: leaveBarrier() !!! checking znodes
contnains /ready node or not: at superstep:0
znode:[attempt_appattempt_1318835555330_0018_000001_0000_000001_1,
attempt_appattempt_1318835555330_0018_000001_0000_000000_0, ready]
11/10/17 11:15:02 INFO bsp.BSPPeerImpl: leaveBarrier() at superstep:0 znode
size: (2) znodes:[attempt_appattempt_1318835555330_0018_000001_0000_000001_1,
attempt_appattempt_1318835555330_0018_000001_0000_000000_0]
11/10/17 11:15:02 INFO bsp.BSPPeerImpl: leaveBarrier() at superstep: 0
taskid:attempt_appattempt_1318835555330_0018_000001_0000_000000_0 lowest:
attempt_appattempt_1318835555330_0018_000001_0000_000000_0
highest:attempt_appattempt_1318835555330_0018_000001_0000_000001_1
11/10/17 11:15:02 INFO bsp.BSPPeerImpl: leaveBarrier() !!! checking znodes
contnains /ready node or not: at superstep:0
znode:[attempt_appattempt_1318835555330_0018_000001_0000_000000_0, ready]
11/10/17 11:15:02 INFO bsp.BSPPeerImpl: leaveBarrier() at superstep:0 znode
size: (1) znodes:[attempt_appattempt_1318835555330_0018_000001_0000_000000_0]
{noformat}
And it hangs forever.
I use the app attempt id as the znode and for each task I make a znode with the
host:port pair.
Do you know what I made wrong?
I provide you with the patch.
> Add Zookeeper as synchronization service
> ----------------------------------------
>
> Key: HAMA-454
> URL: https://issues.apache.org/jira/browse/HAMA-454
> Project: Hama
> Issue Type: Sub-task
> Reporter: Thomas Jungblut
>
> We should use Zookeeper instead of our own implementation.
> Additionally we should use the plain BSPPeerImpl in YARN to reduce duplicate
> code.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira