Re: [HelenOS-devel] Starving IPC in the async framework

Vojtech Horky Thu, 12 Apr 2012 03:42:23 -0700

Hi Jakub,
I agree. I really should have thought about it more prior posting it
to the mailing list. Sorry.


And thanks for pointing out that leak - the test I sent in the first
e-mail leaks the message if user do not press any key.

- Vojta


Dne 12. dubna 2012 12:19 Jakub Jermar <[email protected]> napsal(a):
> Hi Vojta,
>
> On 11.4.2012 22:48, Vojtech Horky wrote:
>> Dne 11. dubna 2012 18:53 Jakub Jermar <[email protected]> napsal(a):
>>> On 11.4.2012 11:13, Vojtech Horky wrote:
>>>>> It seems to me that the current implementation understands the timeouts
>>>>> rather strictly. In other words, zero timeout indicates a very impatient
>>>>> waiter which doesn't allow any time for the wakeup condition to become
>>>>> true even if that would have happened instantly. I think it won't hurt
>>>>> if the wakeup condition is given one-more non-blocking test to see if
>>>>> the waiter can get the message. Even if the message arrived just after
>>>>> the timeout, the caller will get the message and the timeout event will
>>>>> be silently ignored.
>>>> Are you referring to the ToDo in async_wait_timeout()?
>>>
>>> No, I was just describing the current behavior of timeout == 0. But now
>>> when you mention it, cannot the todo be eliminated by forcing timeout to
>>> 0 and applying your patch?
>> That would not emit the same behavior as the todo requests. The zero
>> timeout would cause a fibril switch (to manager) which is something
>> the todo wants to avoid. At least as I understand it.
>
> Well, the fibrils must assume that doing IPC using the async framework
> can lead to a fibril switch. The same is for using timeouts and fibril
> synchronization primitives.
>
> I can't see why would anyone intentionally specify a negative timeout in
> the first place. Has a negative timeout ever been used in some
> meaningful way in the context of HelenOS async_wait_timeout()? Why it
> should behave differently from timeout of 0 (the TODO is just a todo
> without any well-formed motivation and reasoning)?
>
> Anyway, my first impression for handling it differently was that it was
> trying to avoid some hypothetical problem when doing struct timeval
> arithmetics or comparison with a value smaller than the current time. Or
> it was just a shortcut. Maybe, revision history would reveal more.
>
>>>> Maybe if the part around ipc_wait_cycle() would be extracted to a
>>>> separate function that would be also called in async_wait_timeout(),
>>>> it may work better for those impatient clients?
>>>
>>> I am not sure what you exactly mean here. But I was basically satisfied
>>> with the patch, but see below.
>> Sorry for bad explanation. See the attached patch. It deals with the
>> todo and actually leaves the original "fairness" intact. I think this
>> shall not break anything. And it is probably cleaner solution for the
>> problem.
>
> The patch is interesting, but also a little bit scary or at least rather
> intrusive. Originally, only the manager fibril is expected to pump the
> IPC messages and do the handle_expired_timeouts() and handle_call()
> processing. While the former is probably not a big deal because of the
> non-blocking flag for the ipc_wait_cycle(), now you will basically allow
> any fibril to do timeout and call processing for itself and other
> fibrils. It is not necessarily a problem, but reasoning about the change
> becomes more complicated.
>
> I suspect the new patch does not solve the starvation issue properly.
> Now if the timeout is say 1 (or close enough to 0), it may repeatedly
> happen that by the time it is evaluated in async_manager_worker(), it
> will already be expired and the message will not be pumped just as it
> happens now with timeout 0.
>
>>>>> Please double check that calling handle_expired_timeouts() also before
>>>>> going to ipc_wait_cycle() will not have any undesired side effects.
>>>> I am not sure I follow. Without the patch I sent, it won't be reached
>>>> anyway and with the patch it seems redundant. What I am missing?
>>>
>>> The patch changes the behavior slightly. Now it can happen that the
>>> worker fibril will be woken up by the timeout and, in fact, can start
>>> executing if there is another manager fibril to schedule it. The first
>>> manager fibril will go to ipc_wait_cycle() where it may notice a message
>>> and deliver it to a worker fibril, possibly the one which timed out. I
>>> wonder if there can be some kind of a race when the fibril notices the
>>> timeout, but does not notice the message. Surely, it would notice the
>>> message on a second try. Which brings me to the following point.
>> Are you sure this could happen in single-threaded application?
>
> For that you'd need more threads, but the async framework needs to be
> generic and ready for a situation like that.
>
>>> It also appears to me that, thanks to the missing free() in the handling
>>> of the timeout in async_wait_timeout(), the framework expects that it is
>>> possible to call async_wait() or async_wait_timeout() multiple times
>>> until the reply eventually arrives. Otherwise I think there is a leak of
>> That is correct (obviously, async_wait would just block and you can
>> call it max once). And it seems to me as a sensible behavior.
>>
>>> msg. Would be nice to be able to say something like: "Ok, the wait timed
>>> out, I will not call async_wait*() for the second time." to free up the
>>> message.
>> Do you have any practical use case in mind where this might be useful?
>
> Yes. There are basically two possible scenarios:
>
> Scenario A: (not leaking, not a common pattern in HelenOS)
> You may want to have a loop which waits for the message until it
> arrives, but informs the user that it has not arrived yet after some
> time before it goes to wait again. In this case, the caller will free up
> the amsg_t structure eventually when the message finally arrives and the
> caller awakens in async_wait_timeout().
>
> Scenario B: (leaking, e.g. console_get_kbd_event_timeout())
> You may want to have a possibility to timeout and give up waiting for
> some message and do something else. In this case, the amsg_t structure
> will leak. It would therefore make sense, IMO, to have:
>
>  async_wait_timeout(amsg_t *, ..., bool last_wait);
>
> if last_wait is true, async_wait_timeout() will make a note of it in the
> structure and when it arrives sometimes in the future (it will happen
> eventually), the structure will be deallocated in reply_received(). Of
> course, it would not be possible to call async_wait() or
> async_wait_timeout() on this message again. This is related to ticket #181.
>
> Jakub
>
> _______________________________________________
> HelenOS-devel mailing list
> [email protected]
> http://lists.modry.cz/cgi-bin/listinfo/helenos-devel

_______________________________________________
HelenOS-devel mailing list
[email protected]
http://lists.modry.cz/cgi-bin/listinfo/helenos-devel

Re: [HelenOS-devel] Starving IPC in the async framework

Reply via email to