Hi,

  here is a respin of the buffers allocation optimization patch.

Changes V2 -> V3:
----------------

  - Allocate all aligned buffers with mmap(), not only QP buffers.

Changes V1 -> V2:
----------------

  - Use mmap whatever the page size, not only with 64K pages.



  Buffers are allocated with mthca_alloc_buf(), which rounds the buffers
size to the page size and then allocates page aligned memory using
posix_memalign().

  However, this allocation is quite wasteful on architectures using 64K pages
(ia64 for example) because we then hit glibc's MMAP_THRESHOLD malloc
parameter and chunks are allocated using mmap. thus we end up allocating:

(requested size rounded to the page size) + (page size) + (malloc overhead)

rounded internally to the page size.

  So for example, if we request a buffer of page_size bytes, we end up
consuming 3 pages. In short, for each buffer we allocate, there is an
overhead of 2 pages. This is quite visible on large clusters especially where
the number of QP can reach several thousands.

  This patch replaces the call to posix_memalign() in mthca_alloc_buf() with
a direct call to mmap().

Signed-off-by: Sebastien Dugue <[email protected]>
---
 src/buf.c |   21 +++++++++++++--------
 1 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/src/buf.c b/src/buf.c
index 6c1be4f..985c1f7 100644
--- a/src/buf.c
+++ b/src/buf.c
@@ -35,6 +35,8 @@
 #endif /* HAVE_CONFIG_H */
 
 #include <stdlib.h>
+#include <sys/mman.h>
+#include <errno.h>
 
 #include "mthca.h"
 
@@ -61,16 +63,19 @@ int mthca_alloc_buf(struct mthca_buf *buf, size_t size, int 
page_size)
 {
        int ret;
 
-       ret = posix_memalign(&buf->buf, page_size, align(size, page_size));
-       if (ret)
-               return ret;
+       /* Use mmap directly to allocate an aligned buffer */
+       buf->buf = mmap(0 ,align(size, page_size) , PROT_READ | PROT_WRITE,
+                       MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+
+       if (buf->buf == MAP_FAILED)
+               return errno;
 
        ret = ibv_dontfork_range(buf->buf, size);
-       if (ret)
-               free(buf->buf);
 
-       if (!ret)
-               buf->length = size;
+       if (ret)
+               munmap(buf->buf, align(size, page_size));
+       else
+               buf->length = align(size, page_size);
 
        return ret;
 }
@@ -78,5 +83,5 @@ int mthca_alloc_buf(struct mthca_buf *buf, size_t size, int 
page_size)
 void mthca_free_buf(struct mthca_buf *buf)
 {
        ibv_dofork_range(buf->buf, buf->length);
-       free(buf->buf);
+       munmap(buf->buf, buf->length);
 }
-- 
1.6.3.1

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to