Good to see the list is back again.

So, I've dug into this further. There is *definitely* a deadlock in the
combination of sawman + directfb + multiple layers. I turned on the
deadlock detection code in the fusion kernel module and it reported a
bunch of potential deadlocks:

FusionSkirmish: Potential deadlock between locked 0x25 and to be locked
0x23 in world 0!
FusionSkirmish: Potential deadlock between locked 0x27 and to be locked
0x26 in world 0!
... many lines elided ...
FusionSkirmish: Potential deadlock between locked 0x25 and to be locked
0x23 in world 0!
FusionSkirmish: Potential deadlock between locked 0x23 and to be locked
0x25 in world 0!


but most importantly, it output the following smoking gun to the /proc
filesystem:



10.7 h ( 588) 0x00000025 Layer Context [ 3].28,18,23 - 1x [0x00000002]
(599) 1 WAITING
10.7 h ( 588) 0x00000023 SaWMan [ 5].28,19,25,18,05 - 1x [0x00000003]
(604) 1 WAITING
10.7 h ( 588) 0x00000026 Layer Region [ 5].28,27,18,23,25
10.7 h ( 588) 0x00000027 Surface 1280x720 ARGB [ 5] 28,18,23,25,26
10.7 h ( 588) 0x00000006 DirectFB Main Pool
[24].35,34,33,32,31,30,2f,2e,11,2d,2c,13,2b,2a,29,28,19,27,26,23,25,18,05,01
10.7 h ( 588) 0x00000013 IntelCE_Surfaces [15]
28,26,35,34,33,32,31,30,2f,2e,2d,2c,23,25,27
10.7 h ( 599) 0x0000002b IntelCE Graphics Contex
[15].35,34,33,32,31,30,2f,2e,2d,26,2c,23,25,27,13
10.7 h ( 599) 0x0000002c Surface 1280x720 ARGB [ 2].23,25
10.7 h ( 599) 0x00000035 Surface 48x48 ARGB [ 0].
10.7 h ( 599) 0x00000034 Surface 48x48 ARGB [ 0].
10.7 h ( 588) 0x00000028 Layer Context [ 2].19,23 - 1x [0x00000003]
(604)
10.7 h ( 604) 0x00000037 Surface 1280x720 ARGB [ 0]
10.7 h ( 588) 0x0000000b Surface Pool [ 7].19,28,29,18,23,25,26
10.7 h ( 588) 0x00000004 Fusion Main Pool [14]
0c,28,29,19,25,26,0b,18,23,09,08,05,02,01
10.7 h ( 588) 0x00000003 Fusion Reactor Globals [11]
0c,28,29,19,25,26,0b,18,23,09,08
10.7 h ( 588) 0x0000000c Window Pool [ 2] 28,25
10.7 h ( 588) 0x00000029 Layer Region [ 4].2a,19,23,28
10.7 h ( 588) 0x0000002a Surface 1280x720 ARGB [ 4] 19,23,28,29
10.7 h ( 588) 0x00000019 Display Layer 1 [ 0].
10.7 h ( 588) 0x00000018 Display Layer 0 [ 0].
10.7 h ( 588) 0x00000024 DirectFB Core [ 0].
10.7 h ( 588) 0x00000005 Arena 'DirectFB/Core' [ 1] 02
10.7 h ( 588) 0x00000022 SaWMan Pool [ 3] 23,05,01
10.7 h ( 604) 0x00000036 IntelCE Graphics Contex [ 0]
10.7 h ( 599) 0x00000033 Surface 48x48 ARGB [ 0].
10.7 h ( 588) 0x00000001 Fusion SHM [ 1].05
10.7 h ( 588) 0x00000002 Fusion Arenas [ 0].
10.7 h ( 588) 0x0000000f Surface Memory Pool [ 2] 05,01
10.7 h ( 588) 0x00000007 DirectFB Data Pool [ 2] 05,01
10.7 h ( 599) 0x00000032 Surface 48x48 ARGB [ 0].
10.7 h ( 599) 0x00000031 Surface 47x48 ARGB [ 0].
10.7 h ( 599) 0x00000030 Surface 48x48 ARGB [ 0].
10.7 h ( 599) 0x0000002f Surface 48x48 ARGB [ 0].
10.7 h ( 599) 0x0000002e Surface 48x48 ARGB [ 0].
10.7 h ( 599) 0x0000002d Surface 400x400 ARGB [ 0].
10.7 h ( 588) 0x00000011 System Memory [ 9] 35,34,33,32,31,30,2f,2e,2d
10.7 h ( 588) 0x00000014 IntelCE Layer Configura [ 3] 23,25,26
10.7 h ( 588) 0x00000016 Gfx State Lock [ 3] 23,25,2c
10.7 h ( 588) 0x00000009 Layer Region Pool [ 3] 19,18,23
10.7 h ( 588) 0x00000008 Layer Context Pool [ 0].
10.7 h ( 588) 0x00000021 Display Layer 9 [ 0]
10.7 h ( 588) 0x00000020 Display Layer 8 [ 0]
10.7 h ( 588) 0x0000001f Display Layer 7 [ 0]
10.7 h ( 588) 0x0000001e Display Layer 6 [ 0]
10.7 h ( 588) 0x0000001d Display Layer 5 [ 0]
10.7 h ( 588) 0x0000001c Display Layer 4 [ 0]
10.7 h ( 588) 0x0000001b Display Layer 3 [ 0]
10.7 h ( 588) 0x0000001a Display Layer 2 [ 0]
10.7 h ( 588) 0x00000017 Screen 0 [ 0]
10.7 h ( 588) 0x00000015 IntelCE Graphics Contex [ 0]
10.7 h ( 588) 0x00000012 Preallocated Memory [ 0]
10.7 h ( 588) 0x00000010 Shared Memory [ 0]
10.7 h ( 588) 0x0000000e Colorhash Core [ 0]
10.7 h ( 588) 0x0000000d Clipboard Core [ 0]
10.7 h ( 588) 0x0000000a Palette Pool [ 0]


Where you can see that processes 599 and 604 (two instances of df_neo,
one running in layer 0, the other in layer 1) are in deadlock over
skirmishes Layer Context and SaWMan.

The problem seems to be with the new df_neo instance in layer 1
changing the window configuration (in this case, setting the opacity),
while the first df_neo instance in layer 0 is trying to flip.

- df_neo-0 (process 599) is in the midsts of a StretchBlit call, and
grabs a lock on a graphics surface
- df_neo-1 (process 604) calls SetWindowOpacity, which delegates to the
sawman window manager. df_neo-1 grabs the SaWMan skirmish (35 = 0x23)
before making a number of fusion calls to the wm (process 588), which
performs some window layout, configuring a bunch of windows and
returns.
- Upon return to df_neo-1, df_neo-1 then calls sawman_process_updates
to apply those configuration changes. Along the way, it calls
dfb_layer_context_set_configuration(). This function grabs Layer
Context skirmish (37 = 0x25) and tries to do something to (destroy?)
the surface currently in use by df_neo-0 (for some reason, not sure why
it should need to deal with it, since it's in a separate layer). This
surface is locked by df_neo-0, and so df_neo-1 appears to busy wait
until it is released
- df_neo-0 eventually finishes with its stretch blit and releases the
lock on the surface
- df_neo-1 starts doing something to the surface (destroying it?)
- df_neo-0 (procces 599) decides that it wants to flip, and eventually
calls dfb_window_repaint. This method attempts to grab the Layer
Context skirmish, which is currently held by df_neo-0.
- df_neo-1 finishes its call to dfb_layer_context_set_configuration(),
releasing the Layer Context skirmish
- df_neo-0 grabs the Layer Context skirmish, and continues with the
Flip call, which eventually calls dfb_wm_update_window(), which of
course calls into SaWMan, which attempts to grab the SaWMan skirmish...
currently held by df_neo-1
- df_neo-1, now back in sawman_process_updates, eventually tries to
call dfb_layer_context_set_screenposition(), which attempts to grab the
Layer Context skirmish, currently held by df_neo-0.
- oops... deadlock.

Due to the way the two processes synchronize on the surface, this
deadlock is pretty much guaranteed to happen. It could probably be
reduced in frequency by having sawman_process_updates() grab the Layer
Context skirmish for its entire duration. However, it seems there still
is a risk window for a deadlock should another process just happen to
make a Flip call between the time when SetWindowOpacity is called and
the window manager process finishes responding to the window
configuration requests.

The basic problem, unfortunately, seems to be one of design, in that
there appears to be no enforced order of lock aquisition. If you enter
the directfb API through most window oriented methods, you will grab
the SaWMan skirmish before most anything else. If you enter through
more basic DirectFB methods, you will likely grab other skirmishes
before possibly needing to talk to the window manager and grabbing the
SaWMan skirmish.

In a following email, I will include the debug output from df_neo-0
and df_neo-1 for the last relevant calls to directfb.  Can't attach
to this email without just exceeding the maximum message size.

Any help would be much appreciated.

Richard

Richard Lee wrote:
Hi there-

I'm trying to use sawman as a window manager for directfb. However, if I try to put two programs in two different dfb layers, sawman seems to lock up. From inspecting the stack trace (see below), it looks like there might be a lock inversion problem.

Everything is fine for the first process attaching to the window manager. Also for a second process running in the same layer as the first. However, if the second process is in a *different* layer than the first, everything comes to a halt.

The deadlock might be here:
- process #3 starts up and sets its window opacity
   - in the process of doing so, it grabs a lock on the layer context
     when it locks the window stack
   - it then does a fusion call to have the wm act on the request
- process #1 gets the request and figures out what its going
   to do.
   - in the process, it grabs the sawman manager lock
   - when calling ISawmanManager::ProcessUpdates, the
     code eventually tries to lock the layer context when
     setting its position, but it is already locked by process #3
     --> DEADLOCK
- process #2 is just doing its thing and eventually wants
   to update its window, thus it attempts to grabs the
   sawman manager lock.  However process #1 and #3 are
   deadlocked, so it hangs, too.

It seems to me that this deadlock should happen regardless of whether the windows are in the same or different layers, but it only seems to happen if the layers are different. Something special must happen in this case and I don't yet fully understand the code.

Any help would be appreciated!

Richard


process #1 (slightly modified testman):
--------------------------
#0  0xffffe410 in __kernel_vsyscall ()
#1 0xb7da7329 in ioctl () from /home/trees/neo/proto/dev-ia32/tivo_root/lib/libc.so.6 #2 0xb7e62123 in fusion_skirmish_prevail () from /home/trees/neo/proto/dev-ia32/tivo_root/lib/libfusion-1.2.so.0 #3 0xb7f01eb2 in dfb_layer_context_lock () from /home/trees/neo/proto/dev-ia32/tivo_root/lib/libdirectfb-1.2.so.0 #4 0xb7f03f3d in dfb_layer_context_set_screenposition () from /home/trees/neo/proto/dev-ia32/tivo_root/lib/libdirectfb-1.2.so.0 #5 0xb7e24046 in sawman_process_updates (sawman=0x2012b000, flags=DSFLIP_WAITFORSYNC) at /home/trees/neo/proto/srcroot/opensource/sawman/SaWMan-1.4.0/src/sawman.c:2446 #6 0xb7e218c7 in ISaWManManager_ProcessUpdates (thiz=0x8060560, flags=DSFLIP_WAITFORSYNC) at /home/trees/neo/proto/srcroot/opensource/sawman/SaWMan-1.4.0/src/isawmanmanager.c:129 #7 0x08049789 in MosaicRelayout (tm=0xbf9ef6e4, layout_data=0x0) at /home/trees/neo/proto/srcroot/opensource/sawman/SaWMan-1.4.0/samples/testman.c:387 #8 0x08049921 in MosaicAddWindow (tm=0xbf9ef6e4, layout_data=0x0, window=538107136) at /home/trees/neo/proto/srcroot/opensource/sawman/SaWMan-1.4.0/samples/testman.c:457 #9 0x08048d7b in LayoutWindowAdd (tm=0x40040401, window=536965256) at /home/trees/neo/proto/srcroot/opensource/sawman/SaWMan-1.4.0/samples/testman.c:520 #10 0x08048f6c in window_reconfig (context=0xbf9ef6e4, reconfig=0x2012b2d8) at /home/trees/neo/proto/srcroot/opensource/sawman/SaWMan-1.4.0/samples/testman.c:892 #11 0xb7e21bc5 in manager_call_handler (caller=3, call_arg=8, call_ptr=0x40040401, ctx=0x2012b000, serial=9, ret_val=0xb7bd2ea4) at /home/trees/neo/proto/srcroot/opensource/sawman/SaWMan-1.4.0/src/sawman.c:340 #12 0xb7e5e981 in _fusion_call_process () from /home/trees/neo/proto/dev-ia32/tivo_root/lib/libfusion-1.2.so.0 #13 0xb7e60996 in fusion_dispatch_loop () from /home/trees/neo/proto/dev-ia32/tivo_root/lib/libfusion-1.2.so.0 #14 0xb7e5449c in direct_thread_main () from /home/trees/neo/proto/dev-ia32/tivo_root/lib/libdirect-1.2.so.0
#15 0xb7e33151 in start_thread (arg=0xb7bd3b90) at pthread_create.c:297
#16 0xb7dae3fe in clone () from /home/trees/neo/proto/dev-ia32/tivo_root/lib/libc.so.6


(process #2) slightly modified df_neo to run in window in layer C (the primary layer)
----------------------------------------------------------------
#0  0xffffe410 in ?? ()
#1  0xbfe98618 in ?? ()
#2  0x2012b004 in ?? ()
#3  0x40040401 in ?? ()
#4 0xb7db1329 in ioctl () from /home/trees/neo/proto/dev-ia32/tivo_root/lib/libc.so.6 #5 0xb7e84123 in fusion_skirmish_prevail () from /home/trees/neo/proto/dev-ia32/tivo_root/lib/libfusion-1.2.so.0 #6 0xb724a32d in wm_update_window (window=0x20017400, wm_data=0x805fdc0, window_data=0x2012df00, region=0xbfe9870c, flags=DSFLIP_ONSYNC) at /home/trees/neo/proto/srcroot/opensource/sawman/SaWMan-1.4.0/src/sawman_internal.h:331 #7 0xb7f36528 in dfb_wm_update_window () from /home/trees/neo/proto/dev-ia32/tivo_root/lib/libdirectfb-1.2.so.0 #8 0xb7f32455 in dfb_window_repaint () from /home/trees/neo/proto/dev-ia32/tivo_root/lib/libdirectfb-1.2.so.0 #9 0xb7ea312c in IDirectFBSurface_Window_Flip () from /home/trees/neo/proto/dev-ia32/tivo_root/lib/libdirectfb-1.2.so.0
#10 0x080496a9 in main ()



(process #3) slightly modified df_neo to run in a window in layer B
----------------------------------------------------------------
#0  0xffffe410 in __kernel_vsyscall ()
#1 0xb7d82329 in ioctl () from /home/trees/neo/proto/dev-ia32/tivo_root/lib/libc.so.6 #2 0xb7e51d28 in fusion_call_execute () from /home/trees/neo/proto/dev-ia32/tivo_root/lib/libfusion-1.2.so.0 #3 0xb72115a3 in sawman_call (sawman=0xbf84edf4, call=1075053057, ptr=0x2012b2d8) at /home/trees/neo/proto/srcroot/opensource/sawman/SaWMan-1.4.0/src/sawman.c:740 #4 0xb721cc40 in wm_set_window_config (window=0x20019e00, wm_data=0x805fdc0, window_data=0x2012dd00, updated=0xbf84f004, flags=CWCF_OPACITY) at /home/trees/neo/proto/srcroot/opensource/sawman/SaWMan-1.4.0/wm/sawman/sawman_wm.c:2524 #5 0xb7f07308 in dfb_wm_set_window_config () from /home/trees/neo/proto/dev-ia32/tivo_root/lib/libdirectfb-1.2.so.0 #6 0xb7f0391e in dfb_window_set_opacity () from /home/trees/neo/proto/dev-ia32/tivo_root/lib/libdirectfb-1.2.so.0 #7 0xb7e7a9da in IDirectFBWindow_SetOpacity () from /home/trees/neo/proto/dev-ia32/tivo_root/lib/libdirectfb-1.2.so.0
#8  0x08048b3c in main ()
_______________________________________________
directfb-dev mailing list
directfb-dev@directfb.org
http://mail.directfb.org/cgi-bin/mailman/listinfo/directfb-dev
_______________________________________________
directfb-dev mailing list
directfb-dev@directfb.org
http://mail.directfb.org/cgi-bin/mailman/listinfo/directfb-dev

Reply via email to