On Wed, Oct 22, 2008 at 7:28 AM, Tammo van Lessen <[EMAIL PROTECTED]>wrote:
> Hi guys, > > there is one open issue regarding the extension activity implementation > I'd like to discuss with you. > > Currently, the extension framework supports two models, i) to implement > an extension operation for a very short running, synchronous operation > and b) to give the developer a bit more control of when to complete the > activity when implementing in an asynchronous manner (i.e. directly > calling a complete(), completeWithFault() method vs. having this logic > wrapping a runSync() method which is implemented by the extension > developer. > > complete() and completeWithFault() use Jacob channels to notify that the > activity has been completed. > > Now the question: The approach described above works more or less fine > for short running activity implementation. The problem I see is how to > deal with an engine crash after the extension activity has been started > and before it has been completed. Just making sure I understand correctly: that's only for the asynchronous cases, correct? So when people rely on the completion channel. > In this case we've a problem as we can > not recover the extension code when the PI's state has been recovered. > Thus, we're waiting forever until the extension activity completes. > I've implemented something similar for invoke a few weeks ago. The problem is pretty much the same (IIUC): if the server crashes during an invoke is taking place, we never get a reply (assuming it's a two-way) and the invoke just hangs there. I fixed this by scheduling a task that checks the invoke after the timeout period. If we get the reply properly, the task is cancelled (had to add a cancel operation on the scheduler). If you really get a timeout, the invoke enters normal recovery and the task will just get discarded when it executes. But if the server crashes, when it gets restarted the task will trigger a check of the invoke, see that it hasn't got any reply and that it's not in recovery and force the recovery. So I'm thinking we could generalize this for extension activities? The problem I see is that for the invoke we have some data external to the process (the message exchange) that we can check without reloading the whole thing just to see what it looks like. For extension activities I'm not sure there's an equivalent. So for those async extensions you might also need an additional table just to track the status of of the "call" externally to the process. Does that help? Matthieu > > One possibility would be to also restart the extension implementation > (how?) when the PI recovers, but this might be harmful when the > extension has already done something that must not be repeated. So I > guess it depends on the individual case, but I'm wondering how to deal > with this problem appropriately. > > Any ideas? I hope so ;) > > Cheers, > Tammo >
