On 23.10.2017 13:50, Grazvydas Ignotas wrote:
On Sun, Oct 22, 2017 at 9:33 PM, Nicolai Hähnle <nhaeh...@gmail.com> wrote:
From: Nicolai Hähnle <nicolai.haeh...@amd.com>

Fences are now 4 bytes instead of 96 bytes (on my 64-bit system).

Signaling a fence is a single atomic operation in the fast case plus a
syscall in the slow case.

Testing if a fence is signaled is the same as before (a simple comparison),
but waiting on a fence is now no more expensive than just testing it in
the fast (already signaled) case.
---
  src/util/futex.h   |  4 +++
  src/util/u_queue.c |  2 ++
  src/util/u_queue.h | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
  3 files changed, 94 insertions(+)

...

diff --git a/src/util/u_queue.h b/src/util/u_queue.h
index a3e12260e30..3d9f19f4e6c 100644
--- a/src/util/u_queue.h
+++ b/src/util/u_queue.h
@@ -28,30 +28,117 @@
   *
   * Jobs can be added from any thread. After that, the wait call can be used
   * to wait for completion of the job.
   */

  #ifndef U_QUEUE_H
  #define U_QUEUE_H

  #include <string.h>

+#include "util/futex.h"
  #include "util/list.h"
+#include "util/macros.h"
  #include "util/u_thread.h"

  #ifdef __cplusplus
  extern "C" {
  #endif

  #define UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY      (1 << 0)
  #define UTIL_QUEUE_INIT_RESIZE_IF_FULL            (1 << 1)

+#if defined(__GNUC__) && defined(HAVE_FUTEX)
+#define UTIL_QUEUE_FENCE_FUTEX
+#else
+#define UTIL_QUEUE_FENCE_STANDARD
+#endif
+
+#ifdef UTIL_QUEUE_FENCE_FUTEX
+/* Job completion fence.
+ * Put this into your job structure.
+ */
+struct util_queue_fence {
+   /* The fence can be in one of three states:
+    *  0 - signaled
+    *  1 - unsignaled
+    *  2 - unsignaled, may have waiters
+    */
+   uint32_t val;
+};
+
+static inline void
+util_queue_fence_init(struct util_queue_fence *fence)
+{
+   fence->val = 0;
+}
+
+static inline void
+util_queue_fence_destroy(struct util_queue_fence *fence)
+{
+   assert(fence->val == 0);
+   /* no-op */
+}
+
+static inline void
+util_queue_fence_wait(struct util_queue_fence *fence)
+{
+   uint32_t v = fence->val;
+
+   if (likely(v == 0))
+      return;
+
+   do {
+      if (v != 2)
+         v = __sync_val_compare_and_swap(&fence->val, 1, 2);
+
+      futex_wait(&fence->val, 2);
+      v = fence->val;
+   } while(v != 0);
+}
+
+static inline void
+util_queue_fence_signal(struct util_queue_fence *fence)
+{
+   uint32_t val = __sync_lock_test_and_set(&fence->val, 0);

As this is _signal(), don't you need a full barrier here?

You're right. It's a bit surprising that __sync_lock_test_and_set isn't one.


+
+   assert(val != 0);
+
+   if (val == 2)

The documentation says some architectures may only support a constant
of 1 here...

Again, surprising, but you're right.

I believe both of these could be addressed by using __sync_val_compare_and_swap instead, right?

Or we could move to new-style gcc atomic built-ins, but I'm not sure off-hand how widely available those are.

Thanks,
Nicolai


+      futex_wake_all(&fence->val);
+}

Gražvydas


_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to