On 10/27/2017 12:08 AM, Ondřej Hlavatý wrote:
> I started to experience deadlock while booting HelenOS. It does not
> happen every time, and when I add some debug prints, the deadlock
> disappears completely.
> 
> The issue starts at ps2mouse driver, which adds mouse function from its
> device_add operation. This remote call goes all the way to fun_online,
> in which it is holding the writelock (blocking other drivers) and,
> because the function is exposed, probably waiting inside
> loc_register_tree_function, respectively in loc_service_register.
> 
> Looking at this function, it seems to be very similar to what Jakub
> Jermar describes at:
>       
> http://jakubsuniversalblog.blogspot.cz/2011/09/debugging-file-system-hang-using.html?q=deadlock
> 
> As far as I understand the issue, this shall not be the case - this is
> the sender, not the receiver, and there is no cycle of messages waiting
> for themselves. But after swapping the order of exch release and waiting
> for answer, the deadlock no longer occurs.
> 
> Can someone please confirm, that the order there is correct?

For the record, here is my and Ondrej's conversation from irc:

<jermar> can you see some active calls from locsrv to devman?
<ohlavaty> tbf i cannot reproduce it anymore
<ohlavaty> but i think the only active calls were 4 to ethip
<jermar> from locsrv?
<ohlavaty> there were some chain of stuck messages, ending at ns
<ohlavaty> but ns wasn't sending anything
<jermar> that might be important
<jermar> ns was recently rewritten to use async framework
<jermar> another interesting thing is that unlike in my blog, the
connection between devman and locsrv uses only one phone
<jermar> but I still fail to see anything that would prevent receiving
the answer to LOC_SERVICE_REGISTER forever
<jermar> it would be good if you could collect the ipc <task_id> for all
interesting task ID's
<jermar> I also make an observation that until the LOC_SERIVCE_REGISTER
call is answered, locsrv cannot start processing another call
<jermar> because there is only one fibril and it is currently busy
processing LOC_SERVICE_REGISTER

J.

_______________________________________________
HelenOS-devel mailing list
HelenOS-devel@lists.modry.cz
http://lists.modry.cz/listinfo/helenos-devel

Reply via email to