[
https://issues.apache.org/jira/browse/HAMA-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215572#comment-13215572
]
Suraj Menon commented on HAMA-498:
----------------------------------
While testing with fault injection in different points I found few issues. I am
changing the implementation of BSPTask to conform to the documentation of BSP.
Failure in bsp function skips cleanup function call.
Today's code
{noformat}
private final <KEYIN, VALUEIN, KEYOUT, VALUEOUT, M extends Writable> void
runBSP(
final BSPJob job, BSPPeerImpl<KEYIN, VALUEIN, KEYOUT, VALUEOUT, M>
bspPeer,
final BytesWritable rawSplit, final BSPPeerProtocol umbilical)
throws IOException, SyncException, ClassNotFoundException,
InterruptedException {
BSP<KEYIN, VALUEIN, KEYOUT, VALUEOUT, M> bsp = (BSP<KEYIN, VALUEIN, KEYOUT,
VALUEOUT, M>) ReflectionUtils
.newInstance(job.getConf().getClass("bsp.work.class", BSP.class),
job.getConf());
bsp.setup(bspPeer);
bsp.bsp(bspPeer);
bsp.cleanup(bspPeer);
bspPeer.close();
}
{noformat}
Changed.
{noformat}
private final <KEYIN, VALUEIN, KEYOUT, VALUEOUT, M extends Writable> void
runBSP(
final BSPJob job, BSPPeerImpl<KEYIN, VALUEIN, KEYOUT, VALUEOUT, M>
bspPeer,
final BytesWritable rawSplit, final BSPPeerProtocol umbilical)
throws IOException, SyncException, ClassNotFoundException,
InterruptedException {
BSP<KEYIN, VALUEIN, KEYOUT, VALUEOUT, M> bsp = (BSP<KEYIN, VALUEIN, KEYOUT,
VALUEOUT, M>) ReflectionUtils
.newInstance(job.getConf().getClass("bsp.work.class", BSP.class),
job.getConf());
bsp.setup(bspPeer);
try{
bsp.bsp(bspPeer);
}
finally{
try{
bsp.cleanup(bspPeer);
finally{
// Trusting close to not throw exception should we?
// Will need to check for exception and rethrow it masking
// exception from bspPeer.close.
bspPeer.close();
}
}
}
{noformat}
Let me know if you have any comments on it. I shall make necessary changes
before I upload the patch.
> BSPTask should periodically ping its parent.
> --------------------------------------------
>
> Key: HAMA-498
> URL: https://issues.apache.org/jira/browse/HAMA-498
> Project: Hama
> Issue Type: Sub-task
> Components: bsp
> Affects Versions: 0.4.0
> Reporter: Edward J. Yoon
> Assignee: Suraj Menon
> Labels: newbie
> Fix For: 0.5.0
>
>
> As described in http://wiki.apache.org/hama/GroomServerFaultTolerance
> BSPTask should periodically ping its parent 'GroomServer' for their health
> status.
> 1. If Tasks are unable to ping their parent 'GroomServer', it should be
> killed themselves.
> 2. And, if GroomServer does not receive ping from the childs, GroomServer
> should check whether that child is running.
> You don't need to implement recovery logic in this issue.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira