Re: [Mesa-dev] [PATCH] loader/dri3: Improve dri3 thread-safety

2017-11-21 Thread Thomas Hellstrom

Hi, Nicolai,

On 11/21/2017 11:50 AM, Nicolai Hähnle wrote:

Hi Thomas,

On 20.11.2017 18:19, Thomas Hellstrom wrote:
Is this a multithreaded test? Do you see the error with Ubuntu 17.10 
+ Xorg?


The dEQP-GLES31 tests are all single-threaded.

I could finally test with the Ubuntu 17.10 on Xorg desktop on the same 
system (some other tests were running before in parallel), and indeed 
I don't get hangs there. Looks like it's due to Xwayland -- whether a 
bug in there or "just" some surprising interaction.


Cheers,
Nicolai




Yes, it indeed sounds like the event is never being sent. Could you file 
a bug so that we can track this problem?


/Thomas



/Thomas


On 11/20/2017 06:03 PM, Nicolai Hähnle wrote:

Hi Thomas,

not actually a regression, but I am seeing hangs now with builds 
that include this patch when running the dEQP-GLES31 test. The 
backtrace is:


#3  0x7f947918262a in xcb_wait_for_special_event () from 
/usr/lib/x86_64-linux-gnu/libxcb.so.1
#4  0x7f947c80984e in dri3_wait_for_event_locked 
(draw=0x6140cc78)

    at ../../../mesa-src/src/loader/loader_dri3_helper.c:486
#5  0x7f947c80b8e8 in loader_dri3_wait_for_sbc 
(draw=draw@entry=0x6140cc78, target_sbc=61,
    target_sbc@entry=0, ust=ust@entry=0x7ffd1510d230, 
msc=msc@entry=0x7ffd1510d270,
    sbc=sbc@entry=0x7ffd1510d2b0) at 
../../../mesa-src/src/loader/loader_dri3_helper.c:562
#6  0x7f947c80fbe4 in loader_dri3_swapbuffer_barrier 
(draw=0x6140cc78)

    at ../../../mesa-src/src/loader/loader_dri3_helper.c:1714
#7  0x7f9474ced029 in dri_st_framebuffer_flush_swapbuffers 
(stctx=,
    stfbi=) at 
../../../../../mesa-src/src/gallium/state_trackers/dri/dri_drawable.c:135

#8  0x7f94747730c0 in st_finish (st=st@entry=0x6260c100)
    at ../../../mesa-src/src/mesa/state_tracker/st_cb_flush.c:74
#9  0x7f947477319f in st_glFinish (ctx=)
    at ../../../mesa-src/src/mesa/state_tracker/st_cb_flush.c:104
#10 0x5595fe7d5cfd in glu::CallLogWrapper::glFinish 
(this=0x6030002b9b10)
    at 
/home/nha/amd/tests/deqp/framework/opengl/gluCallLogWrapper.inl:1166
#11 0x5595fe3179dc in deqp::gles31::Functional::(anonymous 
namespace)::GeometryShaderRenderTest::renderWithContext 
(this=this@entry=0x61200dc0, ctx=..., program=..., dstSurface=...)
    at 
/home/nha/amd/tests/deqp/modules/gles31/functional/es31fGeometryShaderTests.cpp:2145 

#12 0x5595fe31d2dc in deqp::gles31::Functional::(anonymous 
namespace)::GeometryShaderRenderTest::iterate (this=0x61200dc0)
    at 
/home/nha/amd/tests/deqp/modules/gles31/functional/es31fGeometryShaderTests.cpp:1934 

#13 0x5595fe2d6cfe in deqp::gles31::TestCaseWrapper::iterate 
(this=0x6020e950,
    testCase=) at 
/home/nha/amd/tests/deqp/modules/gles31/tes31TestPackage.cpp:76
#14 0x5595fe7417d3 in tcu::TestSessionExecutor::iterateTestCase 
(this=this@entry=0x60e0df60,

    testCase=0x61200dc0)

A command-line that fairly reliably causes the hang (but not always 
in the same subtest) is:


/home/nha/amd/tests/deqp/build/modules/gles31/deqp-gles31 

Re: [Mesa-dev] [PATCH] loader/dri3: Improve dri3 thread-safety

2017-11-21 Thread Nicolai Hähnle

Hi Thomas,

On 20.11.2017 18:19, Thomas Hellstrom wrote:
Is this a multithreaded test? Do you see the error with Ubuntu 17.10 + 
Xorg?


The dEQP-GLES31 tests are all single-threaded.

I could finally test with the Ubuntu 17.10 on Xorg desktop on the same 
system (some other tests were running before in parallel), and indeed I 
don't get hangs there. Looks like it's due to Xwayland -- whether a bug 
in there or "just" some surprising interaction.


Cheers,
Nicolai




/Thomas


On 11/20/2017 06:03 PM, Nicolai Hähnle wrote:

Hi Thomas,

not actually a regression, but I am seeing hangs now with builds that 
include this patch when running the dEQP-GLES31 test. The backtrace is:


#3  0x7f947918262a in xcb_wait_for_special_event () from 
/usr/lib/x86_64-linux-gnu/libxcb.so.1
#4  0x7f947c80984e in dri3_wait_for_event_locked 
(draw=0x6140cc78)

    at ../../../mesa-src/src/loader/loader_dri3_helper.c:486
#5  0x7f947c80b8e8 in loader_dri3_wait_for_sbc 
(draw=draw@entry=0x6140cc78, target_sbc=61,
    target_sbc@entry=0, ust=ust@entry=0x7ffd1510d230, 
msc=msc@entry=0x7ffd1510d270,
    sbc=sbc@entry=0x7ffd1510d2b0) at 
../../../mesa-src/src/loader/loader_dri3_helper.c:562
#6  0x7f947c80fbe4 in loader_dri3_swapbuffer_barrier 
(draw=0x6140cc78)

    at ../../../mesa-src/src/loader/loader_dri3_helper.c:1714
#7  0x7f9474ced029 in dri_st_framebuffer_flush_swapbuffers 
(stctx=,
    stfbi=) at 
../../../../../mesa-src/src/gallium/state_trackers/dri/dri_drawable.c:135

#8  0x7f94747730c0 in st_finish (st=st@entry=0x6260c100)
    at ../../../mesa-src/src/mesa/state_tracker/st_cb_flush.c:74
#9  0x7f947477319f in st_glFinish (ctx=)
    at ../../../mesa-src/src/mesa/state_tracker/st_cb_flush.c:104
#10 0x5595fe7d5cfd in glu::CallLogWrapper::glFinish 
(this=0x6030002b9b10)
    at 
/home/nha/amd/tests/deqp/framework/opengl/gluCallLogWrapper.inl:1166
#11 0x5595fe3179dc in deqp::gles31::Functional::(anonymous 
namespace)::GeometryShaderRenderTest::renderWithContext 
(this=this@entry=0x61200dc0, ctx=..., program=..., dstSurface=...)
    at 
/home/nha/amd/tests/deqp/modules/gles31/functional/es31fGeometryShaderTests.cpp:2145 

#12 0x5595fe31d2dc in deqp::gles31::Functional::(anonymous 
namespace)::GeometryShaderRenderTest::iterate (this=0x61200dc0)
    at 
/home/nha/amd/tests/deqp/modules/gles31/functional/es31fGeometryShaderTests.cpp:1934 

#13 0x5595fe2d6cfe in deqp::gles31::TestCaseWrapper::iterate 
(this=0x6020e950,
    testCase=) at 
/home/nha/amd/tests/deqp/modules/gles31/tes31TestPackage.cpp:76
#14 0x5595fe7417d3 in tcu::TestSessionExecutor::iterateTestCase 
(this=this@entry=0x60e0df60,

    testCase=0x61200dc0)

A command-line that fairly reliably causes the hang (but not always in 
the same subtest) is:


/home/nha/amd/tests/deqp/build/modules/gles31/deqp-gles31 

Re: [Mesa-dev] [PATCH] loader/dri3: Improve dri3 thread-safety

2017-11-20 Thread Thomas Hellstrom

Hi, Nicholai,

Is this a multithreaded test? Do you see the error with Ubuntu 17.10 + Xorg?

/Thomas


On 11/20/2017 06:03 PM, Nicolai Hähnle wrote:

Hi Thomas,

not actually a regression, but I am seeing hangs now with builds that 
include this patch when running the dEQP-GLES31 test. The backtrace is:


#3  0x7f947918262a in xcb_wait_for_special_event () from 
/usr/lib/x86_64-linux-gnu/libxcb.so.1
#4  0x7f947c80984e in dri3_wait_for_event_locked 
(draw=0x6140cc78)

    at ../../../mesa-src/src/loader/loader_dri3_helper.c:486
#5  0x7f947c80b8e8 in loader_dri3_wait_for_sbc 
(draw=draw@entry=0x6140cc78, target_sbc=61,
    target_sbc@entry=0, ust=ust@entry=0x7ffd1510d230, 
msc=msc@entry=0x7ffd1510d270,
    sbc=sbc@entry=0x7ffd1510d2b0) at 
../../../mesa-src/src/loader/loader_dri3_helper.c:562
#6  0x7f947c80fbe4 in loader_dri3_swapbuffer_barrier 
(draw=0x6140cc78)

    at ../../../mesa-src/src/loader/loader_dri3_helper.c:1714
#7  0x7f9474ced029 in dri_st_framebuffer_flush_swapbuffers 
(stctx=,
    stfbi=) at 
../../../../../mesa-src/src/gallium/state_trackers/dri/dri_drawable.c:135

#8  0x7f94747730c0 in st_finish (st=st@entry=0x6260c100)
    at ../../../mesa-src/src/mesa/state_tracker/st_cb_flush.c:74
#9  0x7f947477319f in st_glFinish (ctx=)
    at ../../../mesa-src/src/mesa/state_tracker/st_cb_flush.c:104
#10 0x5595fe7d5cfd in glu::CallLogWrapper::glFinish 
(this=0x6030002b9b10)
    at 
/home/nha/amd/tests/deqp/framework/opengl/gluCallLogWrapper.inl:1166
#11 0x5595fe3179dc in deqp::gles31::Functional::(anonymous 
namespace)::GeometryShaderRenderTest::renderWithContext 
(this=this@entry=0x61200dc0, ctx=..., program=..., dstSurface=...)
    at 
/home/nha/amd/tests/deqp/modules/gles31/functional/es31fGeometryShaderTests.cpp:2145
#12 0x5595fe31d2dc in deqp::gles31::Functional::(anonymous 
namespace)::GeometryShaderRenderTest::iterate (this=0x61200dc0)
    at 
/home/nha/amd/tests/deqp/modules/gles31/functional/es31fGeometryShaderTests.cpp:1934
#13 0x5595fe2d6cfe in deqp::gles31::TestCaseWrapper::iterate 
(this=0x6020e950,
    testCase=) at 
/home/nha/amd/tests/deqp/modules/gles31/tes31TestPackage.cpp:76
#14 0x5595fe7417d3 in tcu::TestSessionExecutor::iterateTestCase 
(this=this@entry=0x60e0df60,

    testCase=0x61200dc0)

A command-line that fairly reliably causes the hang (but not always in 
the same subtest) is:


/home/nha/amd/tests/deqp/build/modules/gles31/deqp-gles31 

Re: [Mesa-dev] [PATCH] loader/dri3: Improve dri3 thread-safety

2017-11-20 Thread Nicolai Hähnle

Hi Thomas,

not actually a regression, but I am seeing hangs now with builds that 
include this patch when running the dEQP-GLES31 test. The backtrace is:


#3  0x7f947918262a in xcb_wait_for_special_event () from 
/usr/lib/x86_64-linux-gnu/libxcb.so.1

#4  0x7f947c80984e in dri3_wait_for_event_locked (draw=0x6140cc78)
at ../../../mesa-src/src/loader/loader_dri3_helper.c:486
#5  0x7f947c80b8e8 in loader_dri3_wait_for_sbc 
(draw=draw@entry=0x6140cc78, target_sbc=61,
target_sbc@entry=0, ust=ust@entry=0x7ffd1510d230, 
msc=msc@entry=0x7ffd1510d270,
sbc=sbc@entry=0x7ffd1510d2b0) at 
../../../mesa-src/src/loader/loader_dri3_helper.c:562
#6  0x7f947c80fbe4 in loader_dri3_swapbuffer_barrier 
(draw=0x6140cc78)

at ../../../mesa-src/src/loader/loader_dri3_helper.c:1714
#7  0x7f9474ced029 in dri_st_framebuffer_flush_swapbuffers 
(stctx=,
stfbi=) at 
../../../../../mesa-src/src/gallium/state_trackers/dri/dri_drawable.c:135

#8  0x7f94747730c0 in st_finish (st=st@entry=0x6260c100)
at ../../../mesa-src/src/mesa/state_tracker/st_cb_flush.c:74
#9  0x7f947477319f in st_glFinish (ctx=)
at ../../../mesa-src/src/mesa/state_tracker/st_cb_flush.c:104
#10 0x5595fe7d5cfd in glu::CallLogWrapper::glFinish 
(this=0x6030002b9b10)

at /home/nha/amd/tests/deqp/framework/opengl/gluCallLogWrapper.inl:1166
#11 0x5595fe3179dc in deqp::gles31::Functional::(anonymous 
namespace)::GeometryShaderRenderTest::renderWithContext 
(this=this@entry=0x61200dc0, ctx=..., program=..., dstSurface=...)
at 
/home/nha/amd/tests/deqp/modules/gles31/functional/es31fGeometryShaderTests.cpp:2145
#12 0x5595fe31d2dc in deqp::gles31::Functional::(anonymous 
namespace)::GeometryShaderRenderTest::iterate (this=0x61200dc0)
at 
/home/nha/amd/tests/deqp/modules/gles31/functional/es31fGeometryShaderTests.cpp:1934
#13 0x5595fe2d6cfe in deqp::gles31::TestCaseWrapper::iterate 
(this=0x6020e950,
testCase=) at 
/home/nha/amd/tests/deqp/modules/gles31/tes31TestPackage.cpp:76
#14 0x5595fe7417d3 in tcu::TestSessionExecutor::iterateTestCase 
(this=this@entry=0x60e0df60,

testCase=0x61200dc0)

A command-line that fairly reliably causes the hang (but not always in 
the same subtest) is:


/home/nha/amd/tests/deqp/build/modules/gles31/deqp-gles31 

Re: [Mesa-dev] [PATCH] loader/dri3: Improve dri3 thread-safety

2017-11-07 Thread Nicolai Hähnle

On 06.11.2017 12:53, Thomas Hellstrom wrote:

On 11/06/2017 12:14 PM, Nicolai Hähnle wrote:

On 03.11.2017 12:02, Thomas Hellstrom wrote:
It turned out that with recent changes that call into dri3 from 
glFinish(),
it appears like different thread end up waiting for X events 
simultaneously,
causing deadlocks since they steal events from eachoter and update 
the dri3

counters behind eachothers backs.

This patch intends to improve on that. It allows at most one thread at a
time to wait on events for a single drawable. If another thread 
intends to
do the same, it's put to sleep until the first thread finishes 
waiting, and
then it rechecks counters and optionally retries the waiting. Threads 
that

poll for X events never pulls X events off the event queue if there are
other threads waiting for events on that drawable. Counters in the
dri3 drawable structure are protected by a mutex. Finally, the mutex we
introduce is never held while waiting for the X server to avoid
unnecessary stalls.

This does not make dri3 drawables completely thread-safe but at least 
it's a

first step.

Bugzilla: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.freedesktop.org_show-5Fbug.cgi-3Fid-3D102358=DwICaQ=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=yqE5Xb9bQg5hA8gP7s0b3dSSZoPWaAEKACqD8qfhdZo=wf-xBzPlZ805RCV1hnDgoyW0fYe2ZX3Qwie66SU936g= 

Fixes: d5ba75f8881 "st/dri2 Plumb the flush_swapbuffer functionality 
through to dri3"

Signed-off-by: Thomas Hellstrom 
---
  src/loader/loader_dri3_helper.c | 77 
+++--

  src/loader/loader_dri3_helper.h | 10 ++
  2 files changed, 69 insertions(+), 18 deletions(-)

diff --git a/src/loader/loader_dri3_helper.c 
b/src/loader/loader_dri3_helper.c

index 19ab581..7e6b8b2 100644
--- a/src/loader/loader_dri3_helper.c
+++ b/src/loader/loader_dri3_helper.c
@@ -32,7 +32,6 @@
    #include 
  -#include 
  #include "loader_dri3_helper.h"
    /* From xmlpool/options.h, user exposed so should be stable */
@@ -186,8 +185,11 @@ dri3_fence_await(xcb_connection_t *c, struct 
loader_dri3_drawable *draw,

  {
 xcb_flush(c);
 xshmfence_await(buffer->shm_fence);
-   if (draw)
+   if (draw) {
+  mtx_lock(>mtx);
    dri3_flush_present_events(draw);
+  mtx_unlock(>mtx);
+   }
  }
    static void
@@ -245,6 +247,9 @@ loader_dri3_drawable_fini(struct 
loader_dri3_drawable *draw)

    xcb_discard_reply(draw->conn, cookie.sequence);
    xcb_unregister_for_special_event(draw->conn, 
draw->special_event);

 }
+
+   cnd_destroy(>event_cnd);
+   mtx_destroy(>mtx);
  }
    int
@@ -276,6 +281,8 @@ loader_dri3_drawable_init(xcb_connection_t *conn,
   draw->cur_blit_source = -1;
 draw->back_format = __DRI_IMAGE_FORMAT_NONE;
+   mtx_init(>mtx, mtx_plain);
+   cnd_init(>event_cnd);
   if (draw->ext->config)
draw->ext->config->configQueryi(draw->dri_screen,
@@ -407,13 +414,27 @@ dri3_handle_present_event(struct 
loader_dri3_drawable *draw,

  }
    static bool
-dri3_wait_for_event(struct loader_dri3_drawable *draw)
+dri3_wait_for_event_locked(struct loader_dri3_drawable *draw)
  {
 xcb_generic_event_t *ev;
 xcb_present_generic_event_t *ge;
   xcb_flush(draw->conn);


(Why) Doesn't the flush need to be protected as well? Can it be 
removed entirely given that it's already called from 
loader_dri3_wait_for_msc? Though dri3_find_back is different - why?




Thanks for reviewing.

AFAIK, (correc me if I'm wrong) xcb should be thread-safe enough to 
allow multiple threads to call xcb_flush() simultaneously so we 
shouldn't need to explicitly "serialize" it.


There might indeed be unnecessary xcb_flushes(). In the dri3_find_back 
case when we're initially just checking for arrived events, we know that 
any preceding swapbuffers (that govern the reuse of back buffers) will 
have flushed xcb. However when we explicitly wait for special events, 
any forgotten flush would be catastrophic, causing a deadlock.


We should probably audit the code for unnecessary xcb flushes, but as a 
follow-up patch.


Fair enough.

Acked-by: Nicolai Hähnle 




/Thomas



Apart from that, the patch does indeed look like a good first step, 
though Keep in mind that I'm not too familiar with this stuff...


Cheers,
Nicolai





--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] loader/dri3: Improve dri3 thread-safety

2017-11-06 Thread Thomas Hellstrom

On 11/06/2017 12:14 PM, Nicolai Hähnle wrote:

On 03.11.2017 12:02, Thomas Hellstrom wrote:
It turned out that with recent changes that call into dri3 from 
glFinish(),
it appears like different thread end up waiting for X events 
simultaneously,
causing deadlocks since they steal events from eachoter and update 
the dri3

counters behind eachothers backs.

This patch intends to improve on that. It allows at most one thread at a
time to wait on events for a single drawable. If another thread 
intends to
do the same, it's put to sleep until the first thread finishes 
waiting, and
then it rechecks counters and optionally retries the waiting. Threads 
that

poll for X events never pulls X events off the event queue if there are
other threads waiting for events on that drawable. Counters in the
dri3 drawable structure are protected by a mutex. Finally, the mutex we
introduce is never held while waiting for the X server to avoid
unnecessary stalls.

This does not make dri3 drawables completely thread-safe but at least 
it's a

first step.

Bugzilla: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.freedesktop.org_show-5Fbug.cgi-3Fid-3D102358=DwICaQ=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=yqE5Xb9bQg5hA8gP7s0b3dSSZoPWaAEKACqD8qfhdZo=wf-xBzPlZ805RCV1hnDgoyW0fYe2ZX3Qwie66SU936g=
Fixes: d5ba75f8881 "st/dri2 Plumb the flush_swapbuffer functionality 
through to dri3"

Signed-off-by: Thomas Hellstrom 
---
  src/loader/loader_dri3_helper.c | 77 
+++--

  src/loader/loader_dri3_helper.h | 10 ++
  2 files changed, 69 insertions(+), 18 deletions(-)

diff --git a/src/loader/loader_dri3_helper.c 
b/src/loader/loader_dri3_helper.c

index 19ab581..7e6b8b2 100644
--- a/src/loader/loader_dri3_helper.c
+++ b/src/loader/loader_dri3_helper.c
@@ -32,7 +32,6 @@
    #include 
  -#include 
  #include "loader_dri3_helper.h"
    /* From xmlpool/options.h, user exposed so should be stable */
@@ -186,8 +185,11 @@ dri3_fence_await(xcb_connection_t *c, struct 
loader_dri3_drawable *draw,

  {
 xcb_flush(c);
 xshmfence_await(buffer->shm_fence);
-   if (draw)
+   if (draw) {
+  mtx_lock(>mtx);
    dri3_flush_present_events(draw);
+  mtx_unlock(>mtx);
+   }
  }
    static void
@@ -245,6 +247,9 @@ loader_dri3_drawable_fini(struct 
loader_dri3_drawable *draw)

    xcb_discard_reply(draw->conn, cookie.sequence);
    xcb_unregister_for_special_event(draw->conn, 
draw->special_event);

 }
+
+   cnd_destroy(>event_cnd);
+   mtx_destroy(>mtx);
  }
    int
@@ -276,6 +281,8 @@ loader_dri3_drawable_init(xcb_connection_t *conn,
   draw->cur_blit_source = -1;
 draw->back_format = __DRI_IMAGE_FORMAT_NONE;
+   mtx_init(>mtx, mtx_plain);
+   cnd_init(>event_cnd);
   if (draw->ext->config)
draw->ext->config->configQueryi(draw->dri_screen,
@@ -407,13 +414,27 @@ dri3_handle_present_event(struct 
loader_dri3_drawable *draw,

  }
    static bool
-dri3_wait_for_event(struct loader_dri3_drawable *draw)
+dri3_wait_for_event_locked(struct loader_dri3_drawable *draw)
  {
 xcb_generic_event_t *ev;
 xcb_present_generic_event_t *ge;
   xcb_flush(draw->conn);


(Why) Doesn't the flush need to be protected as well? Can it be 
removed entirely given that it's already called from 
loader_dri3_wait_for_msc? Though dri3_find_back is different - why?




Thanks for reviewing.

AFAIK, (correc me if I'm wrong) xcb should be thread-safe enough to 
allow multiple threads to call xcb_flush() simultaneously so we 
shouldn't need to explicitly "serialize" it.


There might indeed be unnecessary xcb_flushes(). In the dri3_find_back 
case when we're initially just checking for arrived events, we know that 
any preceding swapbuffers (that govern the reuse of back buffers) will 
have flushed xcb. However when we explicitly wait for special events, 
any forgotten flush would be catastrophic, causing a deadlock.


We should probably audit the code for unnecessary xcb flushes, but as a 
follow-up patch.


/Thomas



Apart from that, the patch does indeed look like a good first step, 
though Keep in mind that I'm not too familiar with this stuff...


Cheers,
Nicolai


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] loader/dri3: Improve dri3 thread-safety

2017-11-06 Thread Nicolai Hähnle

On 03.11.2017 12:02, Thomas Hellstrom wrote:

It turned out that with recent changes that call into dri3 from glFinish(),
it appears like different thread end up waiting for X events simultaneously,
causing deadlocks since they steal events from eachoter and update the dri3
counters behind eachothers backs.

This patch intends to improve on that. It allows at most one thread at a
time to wait on events for a single drawable. If another thread intends to
do the same, it's put to sleep until the first thread finishes waiting, and
then it rechecks counters and optionally retries the waiting. Threads that
poll for X events never pulls X events off the event queue if there are
other threads waiting for events on that drawable. Counters in the
dri3 drawable structure are protected by a mutex. Finally, the mutex we
introduce is never held while waiting for the X server to avoid
unnecessary stalls.

This does not make dri3 drawables completely thread-safe but at least it's a
first step.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102358
Fixes: d5ba75f8881 "st/dri2 Plumb the flush_swapbuffer functionality through to 
dri3"
Signed-off-by: Thomas Hellstrom 
---
  src/loader/loader_dri3_helper.c | 77 +++--
  src/loader/loader_dri3_helper.h | 10 ++
  2 files changed, 69 insertions(+), 18 deletions(-)

diff --git a/src/loader/loader_dri3_helper.c b/src/loader/loader_dri3_helper.c
index 19ab581..7e6b8b2 100644
--- a/src/loader/loader_dri3_helper.c
+++ b/src/loader/loader_dri3_helper.c
@@ -32,7 +32,6 @@
  
  #include 
  
-#include 

  #include "loader_dri3_helper.h"
  
  /* From xmlpool/options.h, user exposed so should be stable */

@@ -186,8 +185,11 @@ dri3_fence_await(xcb_connection_t *c, struct 
loader_dri3_drawable *draw,
  {
 xcb_flush(c);
 xshmfence_await(buffer->shm_fence);
-   if (draw)
+   if (draw) {
+  mtx_lock(>mtx);
dri3_flush_present_events(draw);
+  mtx_unlock(>mtx);
+   }
  }
  
  static void

@@ -245,6 +247,9 @@ loader_dri3_drawable_fini(struct loader_dri3_drawable *draw)
xcb_discard_reply(draw->conn, cookie.sequence);
xcb_unregister_for_special_event(draw->conn, draw->special_event);
 }
+
+   cnd_destroy(>event_cnd);
+   mtx_destroy(>mtx);
  }
  
  int

@@ -276,6 +281,8 @@ loader_dri3_drawable_init(xcb_connection_t *conn,
  
 draw->cur_blit_source = -1;

 draw->back_format = __DRI_IMAGE_FORMAT_NONE;
+   mtx_init(>mtx, mtx_plain);
+   cnd_init(>event_cnd);
  
 if (draw->ext->config)

draw->ext->config->configQueryi(draw->dri_screen,
@@ -407,13 +414,27 @@ dri3_handle_present_event(struct loader_dri3_drawable 
*draw,
  }
  
  static bool

-dri3_wait_for_event(struct loader_dri3_drawable *draw)
+dri3_wait_for_event_locked(struct loader_dri3_drawable *draw)
  {
 xcb_generic_event_t *ev;
 xcb_present_generic_event_t *ge;
  
 xcb_flush(draw->conn);


(Why) Doesn't the flush need to be protected as well? Can it be removed 
entirely given that it's already called from loader_dri3_wait_for_msc? 
Though dri3_find_back is different - why?


Apart from that, the patch does indeed look like a good first step, 
though Keep in mind that I'm not too familiar with this stuff...


Cheers,
Nicolai



-   ev = xcb_wait_for_special_event(draw->conn, draw->special_event);
+
+   /* Only have one thread waiting for events at a time */
+   if (draw->has_event_waiter) {
+  cnd_wait(>event_cnd, >mtx);
+  /* Another thread has updated the protected info, so retest. */
+  return true;
+   } else {
+  draw->has_event_waiter = true;
+  /* Allow other threads access to the drawable while we're waiting. */
+  mtx_unlock(>mtx);
+  ev = xcb_wait_for_special_event(draw->conn, draw->special_event);
+  mtx_lock(>mtx);
+  draw->has_event_waiter = false;
+  cnd_broadcast(>event_cnd);
+   }
 if (!ev)
return false;
 ge = (void *) ev;
@@ -442,19 +463,23 @@ loader_dri3_wait_for_msc(struct loader_dri3_drawable 
*draw,
divisor,
remainder);
  
+   mtx_lock(>mtx);

 xcb_flush(draw->conn);
  
 /* Wait for the event */

 if (draw->special_event) {
while ((int32_t) (msc_serial - draw->recv_msc_serial) > 0) {
- if (!dri3_wait_for_event(draw))
+ if (!dri3_wait_for_event_locked(draw)) {
+mtx_unlock(>mtx);
  return false;
+ }
}
 }
  
 *ust = draw->notify_ust;

 *msc = draw->notify_msc;
 *sbc = draw->recv_sbc;
+   mtx_unlock(>mtx);
  
 return true;

  }
@@ -476,17 +501,21 @@ loader_dri3_wait_for_sbc(struct loader_dri3_drawable 
*draw,
  *  swaps requested with glXSwapBuffersMscOML for that window have
  *  completed."
  */
+   mtx_lock(>mtx);
 if (!target_sbc)
target_sbc = draw->send_sbc;
  
 while (draw->recv_sbc < target_sbc) {

-  if (!dri3_wait_for_event(draw))

[Mesa-dev] [PATCH] loader/dri3: Improve dri3 thread-safety

2017-11-03 Thread Thomas Hellstrom
It turned out that with recent changes that call into dri3 from glFinish(),
it appears like different thread end up waiting for X events simultaneously,
causing deadlocks since they steal events from eachoter and update the dri3
counters behind eachothers backs.

This patch intends to improve on that. It allows at most one thread at a
time to wait on events for a single drawable. If another thread intends to
do the same, it's put to sleep until the first thread finishes waiting, and
then it rechecks counters and optionally retries the waiting. Threads that
poll for X events never pulls X events off the event queue if there are
other threads waiting for events on that drawable. Counters in the
dri3 drawable structure are protected by a mutex. Finally, the mutex we
introduce is never held while waiting for the X server to avoid
unnecessary stalls.

This does not make dri3 drawables completely thread-safe but at least it's a
first step.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102358
Fixes: d5ba75f8881 "st/dri2 Plumb the flush_swapbuffer functionality through to 
dri3"
Signed-off-by: Thomas Hellstrom 
---
 src/loader/loader_dri3_helper.c | 77 +++--
 src/loader/loader_dri3_helper.h | 10 ++
 2 files changed, 69 insertions(+), 18 deletions(-)

diff --git a/src/loader/loader_dri3_helper.c b/src/loader/loader_dri3_helper.c
index 19ab581..7e6b8b2 100644
--- a/src/loader/loader_dri3_helper.c
+++ b/src/loader/loader_dri3_helper.c
@@ -32,7 +32,6 @@
 
 #include 
 
-#include 
 #include "loader_dri3_helper.h"
 
 /* From xmlpool/options.h, user exposed so should be stable */
@@ -186,8 +185,11 @@ dri3_fence_await(xcb_connection_t *c, struct 
loader_dri3_drawable *draw,
 {
xcb_flush(c);
xshmfence_await(buffer->shm_fence);
-   if (draw)
+   if (draw) {
+  mtx_lock(>mtx);
   dri3_flush_present_events(draw);
+  mtx_unlock(>mtx);
+   }
 }
 
 static void
@@ -245,6 +247,9 @@ loader_dri3_drawable_fini(struct loader_dri3_drawable *draw)
   xcb_discard_reply(draw->conn, cookie.sequence);
   xcb_unregister_for_special_event(draw->conn, draw->special_event);
}
+
+   cnd_destroy(>event_cnd);
+   mtx_destroy(>mtx);
 }
 
 int
@@ -276,6 +281,8 @@ loader_dri3_drawable_init(xcb_connection_t *conn,
 
draw->cur_blit_source = -1;
draw->back_format = __DRI_IMAGE_FORMAT_NONE;
+   mtx_init(>mtx, mtx_plain);
+   cnd_init(>event_cnd);
 
if (draw->ext->config)
   draw->ext->config->configQueryi(draw->dri_screen,
@@ -407,13 +414,27 @@ dri3_handle_present_event(struct loader_dri3_drawable 
*draw,
 }
 
 static bool
-dri3_wait_for_event(struct loader_dri3_drawable *draw)
+dri3_wait_for_event_locked(struct loader_dri3_drawable *draw)
 {
xcb_generic_event_t *ev;
xcb_present_generic_event_t *ge;
 
xcb_flush(draw->conn);
-   ev = xcb_wait_for_special_event(draw->conn, draw->special_event);
+
+   /* Only have one thread waiting for events at a time */
+   if (draw->has_event_waiter) {
+  cnd_wait(>event_cnd, >mtx);
+  /* Another thread has updated the protected info, so retest. */
+  return true;
+   } else {
+  draw->has_event_waiter = true;
+  /* Allow other threads access to the drawable while we're waiting. */
+  mtx_unlock(>mtx);
+  ev = xcb_wait_for_special_event(draw->conn, draw->special_event);
+  mtx_lock(>mtx);
+  draw->has_event_waiter = false;
+  cnd_broadcast(>event_cnd);
+   }
if (!ev)
   return false;
ge = (void *) ev;
@@ -442,19 +463,23 @@ loader_dri3_wait_for_msc(struct loader_dri3_drawable 
*draw,
   divisor,
   remainder);
 
+   mtx_lock(>mtx);
xcb_flush(draw->conn);
 
/* Wait for the event */
if (draw->special_event) {
   while ((int32_t) (msc_serial - draw->recv_msc_serial) > 0) {
- if (!dri3_wait_for_event(draw))
+ if (!dri3_wait_for_event_locked(draw)) {
+mtx_unlock(>mtx);
 return false;
+ }
   }
}
 
*ust = draw->notify_ust;
*msc = draw->notify_msc;
*sbc = draw->recv_sbc;
+   mtx_unlock(>mtx);
 
return true;
 }
@@ -476,17 +501,21 @@ loader_dri3_wait_for_sbc(struct loader_dri3_drawable 
*draw,
 *  swaps requested with glXSwapBuffersMscOML for that window have
 *  completed."
 */
+   mtx_lock(>mtx);
if (!target_sbc)
   target_sbc = draw->send_sbc;
 
while (draw->recv_sbc < target_sbc) {
-  if (!dri3_wait_for_event(draw))
+  if (!dri3_wait_for_event_locked(draw)) {
+ mtx_unlock(>mtx);
  return 0;
+  }
}
 
*ust = draw->ust;
*msc = draw->msc;
*sbc = draw->recv_sbc;
+   mtx_unlock(>mtx);
return 1;
 }
 
@@ -499,16 +528,16 @@ static int
 dri3_find_back(struct loader_dri3_drawable *draw)
 {
int b;
-   xcb_generic_event_t *ev;
-   xcb_present_generic_event_t *ge;
-   int num_to_consider = draw->num_back;
+   int num_to_consider;
 
+