Hi Jakub, I agree. I really should have thought about it more prior posting it to the mailing list. Sorry.
And thanks for pointing out that leak - the test I sent in the first e-mail leaks the message if user do not press any key. - Vojta Dne 12. dubna 2012 12:19 Jakub Jermar <[email protected]> napsal(a): > Hi Vojta, > > On 11.4.2012 22:48, Vojtech Horky wrote: >> Dne 11. dubna 2012 18:53 Jakub Jermar <[email protected]> napsal(a): >>> On 11.4.2012 11:13, Vojtech Horky wrote: >>>>> It seems to me that the current implementation understands the timeouts >>>>> rather strictly. In other words, zero timeout indicates a very impatient >>>>> waiter which doesn't allow any time for the wakeup condition to become >>>>> true even if that would have happened instantly. I think it won't hurt >>>>> if the wakeup condition is given one-more non-blocking test to see if >>>>> the waiter can get the message. Even if the message arrived just after >>>>> the timeout, the caller will get the message and the timeout event will >>>>> be silently ignored. >>>> Are you referring to the ToDo in async_wait_timeout()? >>> >>> No, I was just describing the current behavior of timeout == 0. But now >>> when you mention it, cannot the todo be eliminated by forcing timeout to >>> 0 and applying your patch? >> That would not emit the same behavior as the todo requests. The zero >> timeout would cause a fibril switch (to manager) which is something >> the todo wants to avoid. At least as I understand it. > > Well, the fibrils must assume that doing IPC using the async framework > can lead to a fibril switch. The same is for using timeouts and fibril > synchronization primitives. > > I can't see why would anyone intentionally specify a negative timeout in > the first place. Has a negative timeout ever been used in some > meaningful way in the context of HelenOS async_wait_timeout()? Why it > should behave differently from timeout of 0 (the TODO is just a todo > without any well-formed motivation and reasoning)? > > Anyway, my first impression for handling it differently was that it was > trying to avoid some hypothetical problem when doing struct timeval > arithmetics or comparison with a value smaller than the current time. Or > it was just a shortcut. Maybe, revision history would reveal more. > >>>> Maybe if the part around ipc_wait_cycle() would be extracted to a >>>> separate function that would be also called in async_wait_timeout(), >>>> it may work better for those impatient clients? >>> >>> I am not sure what you exactly mean here. But I was basically satisfied >>> with the patch, but see below. >> Sorry for bad explanation. See the attached patch. It deals with the >> todo and actually leaves the original "fairness" intact. I think this >> shall not break anything. And it is probably cleaner solution for the >> problem. > > The patch is interesting, but also a little bit scary or at least rather > intrusive. Originally, only the manager fibril is expected to pump the > IPC messages and do the handle_expired_timeouts() and handle_call() > processing. While the former is probably not a big deal because of the > non-blocking flag for the ipc_wait_cycle(), now you will basically allow > any fibril to do timeout and call processing for itself and other > fibrils. It is not necessarily a problem, but reasoning about the change > becomes more complicated. > > I suspect the new patch does not solve the starvation issue properly. > Now if the timeout is say 1 (or close enough to 0), it may repeatedly > happen that by the time it is evaluated in async_manager_worker(), it > will already be expired and the message will not be pumped just as it > happens now with timeout 0. > >>>>> Please double check that calling handle_expired_timeouts() also before >>>>> going to ipc_wait_cycle() will not have any undesired side effects. >>>> I am not sure I follow. Without the patch I sent, it won't be reached >>>> anyway and with the patch it seems redundant. What I am missing? >>> >>> The patch changes the behavior slightly. Now it can happen that the >>> worker fibril will be woken up by the timeout and, in fact, can start >>> executing if there is another manager fibril to schedule it. The first >>> manager fibril will go to ipc_wait_cycle() where it may notice a message >>> and deliver it to a worker fibril, possibly the one which timed out. I >>> wonder if there can be some kind of a race when the fibril notices the >>> timeout, but does not notice the message. Surely, it would notice the >>> message on a second try. Which brings me to the following point. >> Are you sure this could happen in single-threaded application? > > For that you'd need more threads, but the async framework needs to be > generic and ready for a situation like that. > >>> It also appears to me that, thanks to the missing free() in the handling >>> of the timeout in async_wait_timeout(), the framework expects that it is >>> possible to call async_wait() or async_wait_timeout() multiple times >>> until the reply eventually arrives. Otherwise I think there is a leak of >> That is correct (obviously, async_wait would just block and you can >> call it max once). And it seems to me as a sensible behavior. >> >>> msg. Would be nice to be able to say something like: "Ok, the wait timed >>> out, I will not call async_wait*() for the second time." to free up the >>> message. >> Do you have any practical use case in mind where this might be useful? > > Yes. There are basically two possible scenarios: > > Scenario A: (not leaking, not a common pattern in HelenOS) > You may want to have a loop which waits for the message until it > arrives, but informs the user that it has not arrived yet after some > time before it goes to wait again. In this case, the caller will free up > the amsg_t structure eventually when the message finally arrives and the > caller awakens in async_wait_timeout(). > > Scenario B: (leaking, e.g. console_get_kbd_event_timeout()) > You may want to have a possibility to timeout and give up waiting for > some message and do something else. In this case, the amsg_t structure > will leak. It would therefore make sense, IMO, to have: > > async_wait_timeout(amsg_t *, ..., bool last_wait); > > if last_wait is true, async_wait_timeout() will make a note of it in the > structure and when it arrives sometimes in the future (it will happen > eventually), the structure will be deallocated in reply_received(). Of > course, it would not be possible to call async_wait() or > async_wait_timeout() on this message again. This is related to ticket #181. > > Jakub > > _______________________________________________ > HelenOS-devel mailing list > [email protected] > http://lists.modry.cz/cgi-bin/listinfo/helenos-devel _______________________________________________ HelenOS-devel mailing list [email protected] http://lists.modry.cz/cgi-bin/listinfo/helenos-devel
