On Wednesday 09 January 2008 23:53, Roland Dreier wrote:

> Isn't that just a crazy way of coming up with a new name for unsigned long?
> 
> How about something like this (it seems to generate pretty good code
> on x86-64 at least):
> 
> diff --git a/src/qp.c b/src/qp.c
> index 8b4adaa..5721860 100644
> --- a/src/qp.c
> +++ b/src/qp.c
> @@ -168,6 +168,20 @@ static void set_data_seg(struct mlx4_wqe_data_seg *dseg, 
> struct ibv_sge *sg)
>       dseg->byte_count = htonl(sg->length);
>  }
>  
> +/*
> + * Avoid using memcpy() to copy to BlueFlame page, since memcpy()
> + * implementations may use move-string-buffer assembler instructions,
> + * which do not guarantee order of copying.
> + */
> +static void mlx4_bf_copy(unsigned long *dst, unsigned long *src, unsigned 
> bytecnt)
> +{
> +     while (bytecnt > 0) {
> +             *dst++ = *src++;
> +             *dst++ = *src++;
> +             bytecnt -= 2 * sizeof (long);
> +     }
> +}
> +

Agreed regarding the unsigned long.  However, your solution still results in a 
procedure call
(mlx4_bf_copy is compiled as a procedure using gcc 4.1.0 on an X86_64 host, 
even if I add "inline").

I would prefer the patch below (which does generate inline code, and does the
(sizeof(unsigned long) * 2) calculation just once).

- Jack
=======================
diff --git a/src/qp.c b/src/qp.c
index bced740..8fc8450 100644
--- a/src/qp.c
+++ b/src/qp.c
@@ -391,7 +391,24 @@ out:
 
                pthread_spin_lock(&ctx->bf_lock);
 
-               memcpy(ctx->bf_page + ctx->bf_offset, ctrl, align(size * 16, 
64));
+               /*
+                * Avoid using memcpy to copy to BlueFlame page, since recent
+                * memcpy implementations use move-string-buffer assembler
+                * instructions, which do not guarantee order of copying.
+                */
+
+               {
+                       unsigned long *target =
+                               (unsigned long *) (ctx->bf_page + 
ctx->bf_offset);
+                       unsigned long *src = (unsigned long *) ctrl;
+                       int n = align(size * 16, 64) / (sizeof(unsigned long) * 
2);
+                       while (n > 0) {
+                               *target++ = *src++;
+                               *target++ = *src++;
+                               --n;
+                       }
+               }
+
                wc_wmb();
 
                ctx->bf_offset ^= ctx->bf_buf_size;
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to