Re: [HelenOS-devel] Starving IPC in the async framework

Jakub Jermar Thu, 12 Apr 2012 03:19:44 -0700

Hi Vojta,

On 11.4.2012 22:48, Vojtech Horky wrote:
> Dne 11. dubna 2012 18:53 Jakub Jermar <[email protected]> napsal(a):
>> On 11.4.2012 11:13, Vojtech Horky wrote:
>>>> It seems to me that the current implementation understands the timeouts
>>>> rather strictly. In other words, zero timeout indicates a very impatient
>>>> waiter which doesn't allow any time for the wakeup condition to become
>>>> true even if that would have happened instantly. I think it won't hurt
>>>> if the wakeup condition is given one-more non-blocking test to see if
>>>> the waiter can get the message. Even if the message arrived just after
>>>> the timeout, the caller will get the message and the timeout event will
>>>> be silently ignored.
>>> Are you referring to the ToDo in async_wait_timeout()?
>>
>> No, I was just describing the current behavior of timeout == 0. But now
>> when you mention it, cannot the todo be eliminated by forcing timeout to
>> 0 and applying your patch?
> That would not emit the same behavior as the todo requests. The zero
> timeout would cause a fibril switch (to manager) which is something
> the todo wants to avoid. At least as I understand it.


Well, the fibrils must assume that doing IPC using the async framework
can lead to a fibril switch. The same is for using timeouts and fibril
synchronization primitives.

I can't see why would anyone intentionally specify a negative timeout in
the first place. Has a negative timeout ever been used in some
meaningful way in the context of HelenOS async_wait_timeout()? Why it
should behave differently from timeout of 0 (the TODO is just a todo
without any well-formed motivation and reasoning)?

Anyway, my first impression for handling it differently was that it was
trying to avoid some hypothetical problem when doing struct timeval
arithmetics or comparison with a value smaller than the current time. Or
it was just a shortcut. Maybe, revision history would reveal more.

>>> Maybe if the part around ipc_wait_cycle() would be extracted to a
>>> separate function that would be also called in async_wait_timeout(),
>>> it may work better for those impatient clients?
>>
>> I am not sure what you exactly mean here. But I was basically satisfied
>> with the patch, but see below.
> Sorry for bad explanation. See the attached patch. It deals with the
> todo and actually leaves the original "fairness" intact. I think this
> shall not break anything. And it is probably cleaner solution for the
> problem.

The patch is interesting, but also a little bit scary or at least rather
intrusive. Originally, only the manager fibril is expected to pump the
IPC messages and do the handle_expired_timeouts() and handle_call()
processing. While the former is probably not a big deal because of the
non-blocking flag for the ipc_wait_cycle(), now you will basically allow
any fibril to do timeout and call processing for itself and other
fibrils. It is not necessarily a problem, but reasoning about the change
becomes more complicated.

I suspect the new patch does not solve the starvation issue properly.
Now if the timeout is say 1 (or close enough to 0), it may repeatedly
happen that by the time it is evaluated in async_manager_worker(), it
will already be expired and the message will not be pumped just as it
happens now with timeout 0.

>>>> Please double check that calling handle_expired_timeouts() also before
>>>> going to ipc_wait_cycle() will not have any undesired side effects.
>>> I am not sure I follow. Without the patch I sent, it won't be reached
>>> anyway and with the patch it seems redundant. What I am missing?
>>
>> The patch changes the behavior slightly. Now it can happen that the
>> worker fibril will be woken up by the timeout and, in fact, can start
>> executing if there is another manager fibril to schedule it. The first
>> manager fibril will go to ipc_wait_cycle() where it may notice a message
>> and deliver it to a worker fibril, possibly the one which timed out. I
>> wonder if there can be some kind of a race when the fibril notices the
>> timeout, but does not notice the message. Surely, it would notice the
>> message on a second try. Which brings me to the following point.
> Are you sure this could happen in single-threaded application?

For that you'd need more threads, but the async framework needs to be
generic and ready for a situation like that.

>> It also appears to me that, thanks to the missing free() in the handling
>> of the timeout in async_wait_timeout(), the framework expects that it is
>> possible to call async_wait() or async_wait_timeout() multiple times
>> until the reply eventually arrives. Otherwise I think there is a leak of
> That is correct (obviously, async_wait would just block and you can
> call it max once). And it seems to me as a sensible behavior.
> 
>> msg. Would be nice to be able to say something like: "Ok, the wait timed
>> out, I will not call async_wait*() for the second time." to free up the
>> message.
> Do you have any practical use case in mind where this might be useful?

Yes. There are basically two possible scenarios:

Scenario A: (not leaking, not a common pattern in HelenOS)
You may want to have a loop which waits for the message until it
arrives, but informs the user that it has not arrived yet after some
time before it goes to wait again. In this case, the caller will free up
the amsg_t structure eventually when the message finally arrives and the
caller awakens in async_wait_timeout().

Scenario B: (leaking, e.g. console_get_kbd_event_timeout())
You may want to have a possibility to timeout and give up waiting for
some message and do something else. In this case, the amsg_t structure
will leak. It would therefore make sense, IMO, to have:

  async_wait_timeout(amsg_t *, ..., bool last_wait);

if last_wait is true, async_wait_timeout() will make a note of it in the
structure and when it arrives sometimes in the future (it will happen
eventually), the structure will be deallocated in reply_received(). Of
course, it would not be possible to call async_wait() or
async_wait_timeout() on this message again. This is related to ticket #181.

Jakub

_______________________________________________
HelenOS-devel mailing list
[email protected]
http://lists.modry.cz/cgi-bin/listinfo/helenos-devel

Re: [HelenOS-devel] Starving IPC in the async framework

Reply via email to