On Nov 22, 2006, at 9:00 AM, Nigel Daley wrote:
Arun, the proposal looks good. If the JT always gets a stale seqNo from the TT (because of some unrecoverable problem in the TT), will it send the saved response forever? Or should there be some maximum resends?
I think that if the SeqNo doesn't match, it shouldn't count for the 10 minute task tracker timeout. So if a task tracker gets stuck, it will get lost in 10 minutes.
Also, when the JT is resending a JTResponse, can it add or change the list of actions? Or do they need to be identical?
For a first pass I'd require that they be identical. If the actions change, you need to assign a new SeqNo and track both the old and new SeqNo. Furthermore, piling more work on a task tracker that is running behind doesn't sound like a good strategy.
Is it possible that a TT can get the same JTResponse more than once? If so, does the TT need to recognize this?
No. The RPC framework and the fact that only one task will be sending heartbeats will prevent that.
-- Owen