Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-14 Thread Woodruff, Robert J
Robert Walsh wrote, 
 
 [EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12
 4730: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 |
 iters=1 | duplex=0 | cma=0 |
 4730: Local address:  LID 0x03, QPN 0x001d, PSN 0x9e070c RKey
0x2302400
 VAddr 0x2a95dd3480
 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey
0x2402500
 VAddr 0x2a95c85480
 4730:main: Completion with error at client:
 4730:main: Failed status 9: wr_id 3
 4730:main: scnt=7584, ccnt=6584
 [EMAIL PROTECTED] bin]$  

Hi Woody,
Robert Walsh wrote, 
When RC4 is available, there should be a patch in there that will fix
this.  Can you let us know if you continue to see problems?

Regards,
 Robert.

I installed RC5 and now it just hangs, 

[EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12
4702: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 |
iters=1 | duplex=0 | cma=0 |
4702: Local address:  LID 0x03, QPN 0x000d, PSN 0xf1b711 RKey 0x1101200
VAddr 0x2a95dc8480
4702: Remote address: LID 0x04, QPN 0x000d, PSN 0xe62247, RKey 0x1101200
VAddr 0x2a95c7c480
hangs here and have to cntrl-c the test.


Intel MPI also fails with, 
# Barrier
[1][rdma_iba.c:260] Intel MPI fatal error: DTO operation completed with
error. status=0x8. cookie=0x514ee0
rank 1 in job 4  rkl-13_32779   caused collective abort of all ranks
  exit status of rank 1: killed by signal 9 

woody

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-14 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

 I installed RC5 and now it just hangs, 
 
 [EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12
 4702: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 |
 iters=1 | duplex=0 | cma=0 |
 4702: Local address:  LID 0x03, QPN 0x000d, PSN 0xf1b711 RKey 0x1101200
 VAddr 0x2a95dc8480
 4702: Remote address: LID 0x04, QPN 0x000d, PSN 0xe62247, RKey 0x1101200
 VAddr 0x2a95c7c480
 hangs here and have to cntrl-c the test.
 
 
 Intel MPI also fails with, 
 # Barrier
 [1][rdma_iba.c:260] Intel MPI fatal error: DTO operation completed with
 error. status=0x8. cookie=0x514ee0
 rank 1 in job 4  rkl-13_32779   caused collective abort of all ranks
   exit status of rank 1: killed by signal 9 

OK - thanks for the report - I'll look into it.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRQm6fvzvnpzTd9fxAQKmiggAhKyznnhzO3ndlYYJx58cSX8XK/R5WNz0
CVhrKxVtjhq+cYaP6HAC9HmwuhMm18vlHGmw8fvoiwrhYP1h7dxaVgiAt9dX2rRz
svPd4rZnfIu+L9oZYmy7XBkfawwQR30IZPSUbfQDU1ag2r44HsnyZ6VpKucuHLfL
jUFxryC2lmwAU6GhuTKJ8k7XEEQBL3UoczPfL/PTwpFVYvM8CjMgLjwhIfqH++Hv
khciAfsl8HgK5Hd6jj1WCOzMyZmL7GBGrpTsia/hgUGOHkpmEC9wy3dSDZeIqCbI
4cs961Y2TIuciNraaLPbF4mhFFgaLJe4nzxSeTLfcbfxXraSqKbn9Q==
=pWln
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-14 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

 I installed RC5 and now it just hangs, 

Wow - we can't even get RC5 to build here.  What distro are you running?

I've tried this on RC4 + a fixed libipathverbs package and it runs OK
(although it does take a while, which might explain the hang you were
seeing.)

But mostly I'm curious how you get RC5 to build at all.

We really really really shouldn't be attempting to turn RC's around as
fast as RC4 to RC5 went: we basically had about enough time to throw a
patch together without being able to do much testing.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRQn2QPzvnpzTd9fxAQJFogf/fJidIu6UVaSTbGMyia66kgYrtrL5lvtr
FcmyBI01SbjOUnd9rfejt0y1IeN+1O88wBBJBnQPSi3aRUmCufuGYRWM9T2ZXmw8
PxCLyN44AvyF/B6SUfwr8ygXcAQ2nJPvxfdpnEyFlTxBf5gatDg00YiSRu88NtxR
5DrDsK/8OSpy6j0lRVoB7hJh2cs74NhtXawvvzlmGBI4ZhoTmifNPSmPnXwMHJ7+
a4A+dK1cSqjLFUXDh6WPIM5OHS6bKbQeKQ3J4H+I99uK+5n3fb/9CP+Z/aZ3/JEG
Qg9dfgsF4onKNBDsXPoGHjI1iU+FOghLFZCTvYXirkqXPgVsTAVK5A==
=hwu5
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-14 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Woodruff, Robert J wrote:
 Robert Walsh wrote, 
 [EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12
 4730: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 |
 iters=1 | duplex=0 | cma=0 |
 4730: Local address:  LID 0x03, QPN 0x001d, PSN 0x9e070c RKey
 0x2302400
 VAddr 0x2a95dd3480
 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey
 0x2402500
 VAddr 0x2a95c85480
 4730:main: Completion with error at client:
 4730:main: Failed status 9: wr_id 3
 4730:main: scnt=7584, ccnt=6584
 [EMAIL PROTECTED] bin]$  
 
 Hi Woody,
 Robert Walsh wrote, 
 When RC4 is available, there should be a patch in there that will fix
 this.  Can you let us know if you continue to see problems?
 
 Regards,
 Robert.
 
 I installed RC5 and now it just hangs, 
 
 [EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12
 4702: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 |
 iters=1 | duplex=0 | cma=0 |
 4702: Local address:  LID 0x03, QPN 0x000d, PSN 0xf1b711 RKey 0x1101200
 VAddr 0x2a95dc8480
 4702: Remote address: LID 0x04, QPN 0x000d, PSN 0xe62247, RKey 0x1101200
 VAddr 0x2a95c7c480
 hangs here and have to cntrl-c the test.
 
 
 Intel MPI also fails with, 
 # Barrier
 [1][rdma_iba.c:260] Intel MPI fatal error: DTO operation completed with
 error. status=0x8. cookie=0x514ee0
 rank 1 in job 4  rkl-13_32779   caused collective abort of all ranks
   exit status of rank 1: killed by signal 9 

Hi Woody,

So, we built everything using RC5 plus the libipathverbs from subversion
and we were successfully able to run ib_rdma_bw (with your arguments
above) and Intel MPI (a simple MPI hello world program).  I'm going to
continue testing with the Intel MPI testsuite and some applications ISV
applications.

I'll keep you informed.

Regards,
 Robert.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRQoATfzvnpzTd9fxAQLUKQf9E1ps9XbbXplMm6+5O/XDdlWF0BQws1SC
L/aGygh34fZSkpGmCrfze3HhsaOqasu9gUOsJQ89jX6pKNkv4tJAxSJCr+n+bdG3
21Bqr9gcM0MbzrDvOcUDHqvnmC0THlCf0XhikjKg/FJR1e48BIiAOFUzfi0VvI36
G1ZtD8xZXydOfWq7Z4xvyf9Y3qNPIeSKR2JZGJQoGHjxY4+vcteK0UVHfic1Bgpy
9uql47af6tncN+CazYcwf8xnHegiDr34iEEre5wUz//Qy62j8JNPnxhit0W9lXij
zFszTkOHQeibxbFWi9ZRyigTmHanxxRUuznW54NL8NIF30jhnmcksQ==
=06gu
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-14 Thread Michael S. Tsirkin
Well, it looks like the libipathverbs that went into 1.1 branch was botched.
How come?
Please note that Mellanox for one is unable to test libipathverbs at all.
libipathverbs maintainers, please, try to fix by Sunday.
And please, test the changes before you commit them.


Quoting r. Robert Walsh [EMAIL PROTECTED]:
Subject: Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Woodruff, Robert J wrote:
 Robert Walsh wrote, 
 [EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12
 4730: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 |
 iters=1 | duplex=0 | cma=0 |
 4730: Local address:  LID 0x03, QPN 0x001d, PSN 0x9e070c RKey
 0x2302400
 VAddr 0x2a95dd3480
 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey
 0x2402500
 VAddr 0x2a95c85480
 4730:main: Completion with error at client:
 4730:main: Failed status 9: wr_id 3
 4730:main: scnt=7584, ccnt=6584
 [EMAIL PROTECTED] bin]$  
 
 Hi Woody,
 Robert Walsh wrote, 
 When RC4 is available, there should be a patch in there that will fix
 this.  Can you let us know if you continue to see problems?
 
 Regards,
 Robert.
 
 I installed RC5 and now it just hangs, 
 
 [EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12
 4702: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 |
 iters=1 | duplex=0 | cma=0 |
 4702: Local address:  LID 0x03, QPN 0x000d, PSN 0xf1b711 RKey 0x1101200
 VAddr 0x2a95dc8480
 4702: Remote address: LID 0x04, QPN 0x000d, PSN 0xe62247, RKey 0x1101200
 VAddr 0x2a95c7c480
 hangs here and have to cntrl-c the test.
 
 
 Intel MPI also fails with, 
 # Barrier
 [1][rdma_iba.c:260] Intel MPI fatal error: DTO operation completed with
 error. status=0x8. cookie=0x514ee0
 rank 1 in job 4  rkl-13_32779   caused collective abort of all ranks
   exit status of rank 1: killed by signal 9 

Hi Woody,

So, we built everything using RC5 plus the libipathverbs from subversion
and we were successfully able to run ib_rdma_bw (with your arguments
above) and Intel MPI (a simple MPI hello world program).  I'm going to
continue testing with the Intel MPI testsuite and some applications ISV
applications.

I'll keep you informed.

Regards,
 Robert.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRQoATfzvnpzTd9fxAQLUKQf9E1ps9XbbXplMm6+5O/XDdlWF0BQws1SC
L/aGygh34fZSkpGmCrfze3HhsaOqasu9gUOsJQ89jX6pKNkv4tJAxSJCr+n+bdG3
21Bqr9gcM0MbzrDvOcUDHqvnmC0THlCf0XhikjKg/FJR1e48BIiAOFUzfi0VvI36
G1ZtD8xZXydOfWq7Z4xvyf9Y3qNPIeSKR2JZGJQoGHjxY4+vcteK0UVHfic1Bgpy
9uql47af6tncN+CazYcwf8xnHegiDr34iEEre5wUz//Qy62j8JNPnxhit0W9lXij
zFszTkOHQeibxbFWi9ZRyigTmHanxxRUuznW54NL8NIF30jhnmcksQ==
=06gu
-END PGP SIGNATURE-

___
openfabrics-ewg mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openfabrics-ewg

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-14 Thread Michael S. Tsirkin
Quoting r. Robert Walsh [EMAIL PROTECTED]:
 Subject: Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
  I installed RC5 and now it just hangs, 
 
 Wow - we can't even get RC5 to build here.  What distro are you running?
 
 I've tried this on RC4 + a fixed libipathverbs package and it runs OK
 (although it does take a while, which might explain the hang you were
 seeing.)
 
 But mostly I'm curious how you get RC5 to build at all.
 
 We really really really shouldn't be attempting to turn RC's around as
 fast as RC4 to RC5 went: we basically had about enough time to throw a
 patch together without being able to do much testing.

Changes are expected to be tested before you commit.
This is really maintainer's responsibility, please take it seriously.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-13 Thread Woodruff, Robert J
Robert Walsh wrote,
 
 [EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12
 4730: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 |
 iters=1 | duplex=0 | cma=0 |
 4730: Local address:  LID 0x03, QPN 0x001d, PSN 0x9e070c RKey
0x2302400
 VAddr 0x2a95dd3480
 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey
0x2402500
 VAddr 0x2a95c85480
 4730:main: Completion with error at client:
 4730:main: Failed status 9: wr_id 3
 4730:main: scnt=7584, ccnt=6584
 [EMAIL PROTECTED] bin]$  

Hi Woody,
Robert Walsh wrote, 
When RC4 is available, there should be a patch in there that will fix
this.  Can you let us know if you continue to see problems?

Regards,
 Robert.

I installed RC4 and now get this, 


[EMAIL PROTECTED] bin]$ ./ib_rdma_bw 
9035: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000
| duplex=0 | cma=0 |
libibverbs: Warning: no userspace device-specific driver found for
uverbs0
driver search path: /usr/local/ofed/lib64/infiniband
9035:main: No IB devices found

I tried getting the latest ofed 1.1 ipathverbs from svn today that I
thought would have
a fix for this, and I think I got it built ok, although the mellanox
build environment is less than intuitive, but it still seems to fail.
Guess we will try again with RC5 tomorrow. 

woody

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-13 Thread Michael S. Tsirkin
Quoting r. Woodruff, Robert J [EMAIL PROTECTED]:
 I tried getting the latest ofed 1.1 ipathverbs from svn today that I
 thought would have
 a fix for this, and I think I got it built ok, although the mellanox
 build environment is less than intuitive, but it still seems to fail.
 Guess we will try again with RC5 tomorrow. 

It's actually OFED build environment now :)
So you really should report improvement suggestions on list.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-07 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Woodruff, Robert J wrote:
 Robert Walsh wrote,
 I'll give it a spin this afternoon: it looks quite a bit more
 comprehensive than the small patch I did.
 
 I also just tried running the ib_rdma_bw test and it seems to
 be flaky if you stress it. If you just run the defaults, it seems to
 work, but if you crank up the iterations and the message size,
 it sometimes fails with.
 
 [EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12
 4730: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 |
 iters=1 | duplex=0 | cma=0 |
 4730: Local address:  LID 0x03, QPN 0x001d, PSN 0x9e070c RKey 0x2302400
 VAddr 0x2a95dd3480
 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey 0x2402500
 VAddr 0x2a95c85480
 4730:main: Completion with error at client:
 4730:main: Failed status 9: wr_id 3
 4730:main: scnt=7584, ccnt=6584
 [EMAIL PROTECTED] bin]$  

Hi Woody,

When RC4 is available, there should be a patch in there that will fix
this.  Can you let us know if you continue to see problems?

Regards,
 Robert.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRQCzfvzvnpzTd9fxAQLfoAf+JWrBo/pPf/tAvTRFckCqjOn3dpH59mJK
n1KuN/M9lsP0UobIOEAMAR3KLvTfFe2czEb7ThMxcKjYgJHiikxuiSomB3pbsRK5
W0qTEqMmS5QYFXfpPlvVof4xxdvWZDDUzzkxG0bve4zBVjeJMUnu/8jVTTBmGbqd
nmqfLrIP+N8n876x1RZade3DTz0NEDDYRT5d25asbUVuoiF7ldVtbX5RmK6rRdFZ
1ym6fIyHT+fTZ5wnVoTJRdjV8icrR9JpPj/BFL6OoxDQvgMksplDnJaTGc4XinFl
WdwZV2NfImYvwSB4QUgqe4Me/BS1xl4gj+OpaviE2TzP7U6tqQVaHQ==
=OLHZ
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-06 Thread Tziporet Koren
Robert Walsh wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

   
 Here is a slightly modified patch for your attributes issue. Can you give it 
 a try?
 

 I rebuilt OFED from scratch with the patch, and ran successfully on
 Intel MPI 2.0.1 with the refresh patch.  I could not get it to run on
 Intel MPI 3.0b.  If you could verify that the fix you mentioned that is
 in the 2.0.1 refresh patch also made it into 3.0b, I'd appreciate it.
 If you have a later beta version you could send me, that would be great,
 too.

 Regards,
  Robert.
   
I added this patch under fixes to OFED 1.1. Will be in RC4

Tziporet

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-06 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tziporet Koren wrote:
 Robert Walsh wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

  
 Here is a slightly modified patch for your attributes issue. Can you
 give it a try?
 

 I rebuilt OFED from scratch with the patch, and ran successfully on
 Intel MPI 2.0.1 with the refresh patch.  I could not get it to run on
 Intel MPI 3.0b.  If you could verify that the fix you mentioned that is
 in the 2.0.1 refresh patch also made it into 3.0b, I'd appreciate it.
 If you have a later beta version you could send me, that would be great,
 too.

 Regards,
  Robert.
   
 I added this patch under fixes to OFED 1.1. Will be in RC4

Excellent.  Thanks, Tziporet.

Regards,
 Robert.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP8V4fzvnpzTd9fxAQLZVAf+IYtLA2c7cBCbzih2Suy4AHUdD1CghC0U
XL+iWjLo4TFbcUhBIrzwG4M72VQanqhNr2Qs3ZtfU2+qN6qKnSZXdejd7nYYOAsz
5LnrWa6Y+9Jfy3K/JOQ4wpjc3lWs3rvuzPTBhmEPcNHZk5+/m0gbfzYLdrc2djPp
soyFSQpyLdpF0J5iY12EWiPYnFK7ConoqYHkTODZV8IjBJIImvDoScouIC+Uzi+x
HlANIlneKa4/zQHNaK+3vZ6N7ZUq30quMZU6ICMI2gzFEzsEe/HxbtnraXfnXH1J
NQ4mMOJNXwPVveNn1E9zA7IgFTMYsnGH080O5saloj2S6P6jb3PLXw==
=mDD0
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-06 Thread Woodruff, Robert J
 Robert Walsh wrote,
 I rebuilt OFED from scratch with the patch, and ran successfully on
 Intel MPI 2.0.1 with the refresh patch.  I could not get it to run on
 Intel MPI 3.0b.  If you could verify that the fix you mentioned that
is
 in the 2.0.1 refresh patch also made it into 3.0b, I'd appreciate it.
 If you have a later beta version you could send me, that would be
great,
 too.

 Regards,
  Robert.

I spoke with our MPI team lead and it is very likely that the fix that
is in 2.0.1-refresh did not make it into 3.0 beta, but it should be
in the 3.0 release schedule to be completed in a couple of weeks.

woody

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-06 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

 I spoke with our MPI team lead and it is very likely that the fix that
 is in 2.0.1-refresh did not make it into 3.0 beta, but it should be
 in the 3.0 release schedule to be completed in a couple of weeks.

OK then - I'll wait for that.

Regards,
 Robert.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP9kyfzvnpzTd9fxAQJu/wf+PEjyS1xAKzmXD+oZJxUNNeaW7QpqKz3h
zc370m74yIWjI+8GianGN4VM6Zx4InPdsRbGNGTd+FRhmZvYDhuuo8VBQUDdAZdB
Tkm+PomDIWdftj8cWCsiah4UkhzRv//83TiIkGZ5+zk25qOvQ6VAW4fy6vpJhKvo
uTW9Sow/G/BAIuMZ8wwg5Jyz5kbYxDxr+21jzQ+nblM/6YdGVco3GI1/z/dXwK5V
JEPIEu4ZxExOU9yGqS/hculq2Z9WFyGTBYoll67KkhpOuLUxiCxCxStA8Z0x52fG
OIhL0vKYgiOWLZnxZONRsy89OR/mUV7SNZeOZVqJSqMh7SpeLWWYHQ==
=SRiy
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Arlin Davis
Robert,

Here is a slightly modified patch for your attributes issue. Can you give it a 
try?

Signed-off by: Arlin Davis [EMAIL PROTECTED]

Index: dapl/openib/dapl_ib_util.c
===
--- dapl/openib/dapl_ib_util.c  (revision 9106)
+++ dapl/openib/dapl_ib_util.c  (working copy)
@@ -446,6 +446,7 @@
return(dapl_convert_errno(errno,ib_query_hca));
 
if (ia_attr != NULL) {
+   (void) dapl_os_memzero(ia_attr, sizeof(*ia_attr));
ia_attr-adapter_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
ia_attr-vendor_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
ia_attr-ia_address_ptr = 
@@ -470,7 +471,12 @@
/* ia_attr-hardware_version_minor   = dev_attr.fw_ver; */
ia_attr-max_eps  = dev_attr.max_qp;
ia_attr-max_dto_per_ep   = dev_attr.max_qp_wr;
-   ia_attr-max_rdma_read_per_ep = dev_attr.max_qp_rd_atom;
+   ia_attr-max_rdma_read_in = dev_attr.max_qp_rd_atom;
+   ia_attr-max_rdma_read_out= dev_attr.max_qp_rd_atom;
+   ia_attr-max_rdma_read_per_ep_in  = dev_attr.max_qp_rd_atom;
+   ia_attr-max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom;
+   ia_attr-max_rdma_read_per_ep_in_guaranteed  = DAT_TRUE;
+   ia_attr-max_rdma_read_per_ep_out_guaranteed = DAT_TRUE;
ia_attr-max_evds = dev_attr.max_cq;
ia_attr-max_evd_qlen = dev_attr.max_cqe;
ia_attr-max_iov_segments_per_dto = dev_attr.max_sge;
@@ -501,6 +507,7 @@
}

if (ep_attr != NULL) {
+   (void) dapl_os_memzero(ep_attr, sizeof(*ep_attr));
ep_attr-max_mtu_size = port_attr.max_msg_sz;
ep_attr-max_rdma_size= port_attr.max_msg_sz;
ep_attr-max_recv_dtos= dev_attr.max_qp_wr;
Index: dapl/openib_cma/dapl_ib_util.c
===
--- dapl/openib_cma/dapl_ib_util.c  (revision 9106)
+++ dapl/openib_cma/dapl_ib_util.c  (working copy)
@@ -424,6 +424,7 @@
return(dapl_convert_errno(errno,ib_query_hca));
 
if (ia_attr != NULL) {
+   (void) dapl_os_memzero(ia_attr, sizeof(*ia_attr));
ia_attr-adapter_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
ia_attr-vendor_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
ia_attr-ia_address_ptr = 
@@ -446,6 +447,8 @@
ia_attr-hardware_version_major = dev_attr.hw_ver;
ia_attr-max_eps  = dev_attr.max_qp;
ia_attr-max_dto_per_ep   = dev_attr.max_qp_wr;
+   ia_attr-max_rdma_read_in = dev_attr.max_qp_rd_atom;
+   ia_attr-max_rdma_read_out= dev_attr.max_qp_rd_atom;
ia_attr-max_rdma_read_per_ep_in  = dev_attr.max_qp_rd_atom;
ia_attr-max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom;
ia_attr-max_rdma_read_per_ep_in_guaranteed  = DAT_TRUE;
@@ -481,6 +484,7 @@
}

if (ep_attr != NULL) {
+   (void) dapl_os_memzero(ep_attr, sizeof(*ep_attr));
ep_attr-max_mtu_size = port_attr.max_msg_sz;
ep_attr-max_rdma_size= port_attr.max_msg_sz;
ep_attr-max_recv_dtos= dev_attr.max_qp_wr;
Index: dapl/openib_scm/dapl_ib_util.c
===
--- dapl/openib_scm/dapl_ib_util.c  (revision 9106)
+++ dapl/openib_scm/dapl_ib_util.c  (working copy)
@@ -373,6 +373,7 @@
return(dapl_convert_errno(errno,ib_query_hca));
 
if (ia_attr != NULL) {
+   (void) dapl_os_memzero(ia_attr, sizeof(*ia_attr));
ia_attr-adapter_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
ia_attr-vendor_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
ia_attr-ia_address_ptr = 
(DAT_IA_ADDRESS_PTR)hca_ptr-hca_address;
@@ -390,7 +391,12 @@
/* ia_attr-hardware_version_minor   = dev_attr.fw_ver; */
ia_attr-max_eps  = dev_attr.max_qp;
ia_attr-max_dto_per_ep   = dev_attr.max_qp_wr;
-   ia_attr-max_rdma_read_per_ep = dev_attr.max_qp_rd_atom;
+   ia_attr-max_rdma_read_in = dev_attr.max_qp_rd_atom;
+   ia_attr-max_rdma_read_out= dev_attr.max_qp_rd_atom;
+   ia_attr-max_rdma_read_per_ep_in  = dev_attr.max_qp_rd_atom;
+   ia_attr-max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom;
+   ia_attr-max_rdma_read_per_ep_in_guaranteed  = DAT_TRUE;
+   ia_attr-max_rdma_read_per_ep_out_guaranteed = DAT_TRUE;
ia_attr-max_evds = dev_attr.max_cq;
ia_attr-max_evd_qlen

Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Arlin Davis wrote:
 Robert,
 
 Here is a slightly modified patch for your attributes issue. Can you give it 
 a try?
 

I'll give it a spin this afternoon: it looks quite a bit more
comprehensive than the small patch I did.

Regards,
 Robert.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP3sXfzvnpzTd9fxAQLwwAf+IOIsC+gqb9Juzt8rwJJlnSW1PjZFrRGi
NrCnRXvn52tsgclNNHGSzqOgkIntZ2TqxwEJJeTou3UhUQ5laJWEkQgwrvFTazcn
+IQH3BGDLFyZJJQO0WSi2685dEKOH5by6Zp9yVo9sy3Odu6jod2v/uCOjdGkR8ys
CvQW+y70qDmom1SJ9P2XQ4/dxxX/v2IFYOWMoVzMlDZsNnvnti/Uspwc1KpQeP6F
RRwWImlDyuuAW6+JX6atM5Lne797T5IO7MugW6d/+0oAMVU7H3oiDBdX+9tVwBci
IBJJ/PdQ8e7a7x4uOg+LKOSDH16IFVNaua4XhBfVmQEjf1y41KepDw==
=1zt8
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Arlin Davis
Robert Walsh wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Arlin Davis wrote:
  

Robert,

Here is a slightly modified patch for your attributes issue. Can you give it 
a try?




I'll give it a spin this afternoon: it looks quite a bit more
comprehensive than the small patch I did.

Regards,
 Robert.
  


Just added all appropriate RDMA in/out fields and some code to zero out 
the structure to avoid uninitialized data fields.

-arlin

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

 Just added all appropriate RDMA in/out fields and some code to zero out
 the structure to avoid uninitialized data fields.

Yup.  By comprehensive, I meant better :-)
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP32hfzvnpzTd9fxAQJnMwgAgcyxQpxdbk/eLEECXTnAOAYjyv3seTpE
Ir1s+K7JEYL2Rbyk9h9CzbK67YSYe4QeIE52pTopEVFw8mnSLaz+ZIOmvdRUiHSS
FiwEyfbXEPrFKZfyXu/REsigWx5vn7vCZid3hUIdx1vbt9eVAiVPGbAO1ALI8en9
/xc7iTGpYxwBwNOYbdhW0cOCjvobV98Fp6UJebvxd9xiRUS6c2JeZKLYdQyRO5rm
JV7L8HqJr1dS8nbAiPG7DSjCv7/3SFdQVr+Tgt5MQpVfD56z41eBBuXzEfeqsg5E
HHSxUOTdqizpscMyLudAWGAr5DZwOAQ4Z90zAL8gc2YYbjbOT3D6bA==
=JKRU
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Arlin Davis wrote:
 Robert,
 
 Here is a slightly modified patch for your attributes issue. Can you give it 
 a try?

Oddly enough, I'm back to the same problem with your new patch as I saw
with the unpatched version:

  $ mpiexec -n 2 ./a.out
  I_MPI: [1] MPIDI_CH3I_RDMA_init(): will use DAPL provider from
registry: OpenIB-cma
  I_MPI: [0] MPIDI_CH3I_RDMA_init(): will use DAPL provider from
registry: OpenIB-cma
  I_MPI: [0] MPIDI_CH3_Init(): I_MPI: [1] MPIDI_CH3_Init(): will use
rdma configuration
  will use rdma configuration
  [1:ib-idev-06][rdma_iba_init_d.c:154] error(0x60029): OpenIB-cma:
could not create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6)
  Hello world: rank 0 of 2 running on ib-idev-05
  rank 1 in job 1  ib-idev-05_51891   caused collective abort of all ranks
exit status of rank 1: killed by signal 9

Still tracking this one down.  I noticed in the patch you removed a
couple of lines, too:

  - ia_attr-max_rdma_read_per_ep = dev_attr.max_qp_rd_atom;

Any particular reason why you did this?

Regards,
 Robert.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP37QvzvnpzTd9fxAQI79wf6Anc3/Ve7tg3x31hE4i5qa9bB01qEYmEv
9xx4FQqXNbhMos9hHEQAWJ9S0sKccr+yCNekkIX6GzlaVDv+AKDzZF6uzA8Prrhr
CEcf28c1Pw7gflg8MMfVcnAHr2YG/hXyd+ve9m6cGv0rxgPqY6lWmHjghKDxKO7h
f/SaDOaVAuN6kEJMRgIrKIxDyFSVl4z1tGXAK3yHVhslvPqNqGwDqNfFMV6UQK+V
NNfKVVKVCttUWdzcVELzi3zkiat5xDdqIcwQr8xs2YaXHfAGeD4NurWowil887Sn
bRuh5soVdBaKW9mAtQWuAECt9VLDvyYReLWkEq6ikgilPGCeJluDEw==
=TNaE
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Arlin Davis


Oddly enough, I'm back to the same problem with your new patch as I saw
with the unpatched version:
 
Hmmm. We ran this with OFED 1.1 RC3 and MPI 3.0b on an EM64T server with your 
adapter and it worked.

Did you ever pick up the Intel MPI 3.0 beta?


  $ mpiexec -n 2 ./a.out
  I_MPI: [1] MPIDI_CH3I_RDMA_init(): will use DAPL provider from
registry: OpenIB-cma
  I_MPI: [0] MPIDI_CH3I_RDMA_init(): will use DAPL provider from
registry: OpenIB-cma
  I_MPI: [0] MPIDI_CH3_Init(): I_MPI: [1] MPIDI_CH3_Init(): will use
rdma configuration
  will use rdma configuration
  [1:ib-idev-06][rdma_iba_init_d.c:154] error(0x60029): OpenIB-cma:
could not create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6)
  Hello world: rank 0 of 2 running on ib-idev-05
  rank 1 in job 1  ib-idev-05_51891   caused collective abort of all ranks
exit status of rank 1: killed by signal 9

Still tracking this one down.  I noticed in the patch you removed a
couple of lines, too:

  - ia_attr-max_rdma_read_per_ep = dev_attr.max_qp_rd_atom;

Any particular reason why you did this?

max_rdma_read_per_ep is the same as max_rdma_read_per_ep_in. 

Look at dat.h line #369

/* To support backwards compatibility for DAPL-1.0 */
#define max_rdma_read_per_epmax_rdma_read_per_ep_in
#define DAT_IA_FIELD_IA_MAX_DTO_PER_OP  DAT_IA_FIELD_IA_MAX_DTO_PER_EP_IN

/* To support backwards compatibility for DAPL-1.0  DAPL-1.1 */
#define max_mtu_size max_message_size


-arlin

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

 Oddly enough, I'm back to the same problem with your new patch as I saw
 with the unpatched version:
  
 Hmmm. We ran this with OFED 1.1 RC3 and MPI 3.0b on an EM64T server with your 
 adapter and it worked.

Weird - it's not working for me at all.  Maybe I'm messing up somewhere.
 I've got a meeting for the next hour or so - I'll check again when I
get back.

 Did you ever pick up the Intel MPI 3.0 beta?

Yup.

 max_rdma_read_per_ep is the same as max_rdma_read_per_ep_in. 

Ah - fair enough.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP4DLPzvnpzTd9fxAQJ3nwgAiO+dLDRQv22RrBHYqHcodDwC2ZakxzFh
pXBn9j5kwzA2EmnXCvex14v7K168Alqr9lgUpfaGr6StZsCdBU0FY2TRjok41VFl
h+fYu78QFgDjleTMkp17Hl7RG9/r8AWzKzTG1LDn1YqwHrn9ngeZlqFfy1BP1tfB
pkkW+Nj7HQXbXUNiDc/V9HKW7eBOjwCvkfDI7Knbrfp2QVBI/9ABpWGO4bJf3P7X
n9ZzlEBN0SCOHKtGAa1gspQrmJGMHw0qyajUA6Yuyp1dWRygbl8L+ahF2BJFwZSx
KGyhoBRZexpP8m0AJASnKgAVjGf6JR31dL7O8WAOjD4QpFEofMSqqA==
=yDmH
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Woodruff, Robert J wrote:
 Robert Walsh wrote,
 I'll give it a spin this afternoon: it looks quite a bit more
 comprehensive than the small patch I did.
 
 I also just tried running the ib_rdma_bw test and it seems to
 be flaky if you stress it. If you just run the defaults, it seems to
 work, but if you crank up the iterations and the message size,
 it sometimes fails with.
 
 [EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12
 4730: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 |
 iters=1 | duplex=0 | cma=0 |
 4730: Local address:  LID 0x03, QPN 0x001d, PSN 0x9e070c RKey 0x2302400
 VAddr 0x2a95dd3480
 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey 0x2402500
 VAddr 0x2a95c85480
 4730:main: Completion with error at client:
 4730:main: Failed status 9: wr_id 3
 4730:main: scnt=7584, ccnt=6584
 [EMAIL PROTECTED] bin]$  

This looks like a known bug, the fix to which didn't make it into OFED
1.1-RC3.  Hopefully we can still get this into 1.1-RC4.

Regards,
 Robert.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP4aOfzvnpzTd9fxAQKAEggAlZC5hYi9kdxLkj9Mfl/BwHJQxWUwsKcG
K2ck3jtrP6PVa04FdVI/TNL2XE7R3eu69vTfBaTS26pw2CVM6av0ztFiWEV2r5Fu
8FXGJBOuDOYxnwuA0o3yHSMVFtrRW6Jgn2G/JQPZ8IDAK7GrPj3VebvyclPwF5+d
KMPIFXJaTzjoJl2JEGFLiSlf+tFMOEs3vazrRwkZpQezKRcs3F1E6TQImtN7kuYK
0/IKxeS4ZOduXpczsJZgsPs6Y9kYi94XN0E4JeJJAh9Miq+bXkxhxbrafieNl7xW
n9m7i/phcFcngSzDwjBNXE2ZuQjujDpz94SRnkVedomYNbr5zKXBgQ==
=NurT
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

 Here is a slightly modified patch for your attributes issue. Can you give it 
 a try?

I rebuilt OFED from scratch with the patch, and ran successfully on
Intel MPI 2.0.1 with the refresh patch.  I could not get it to run on
Intel MPI 3.0b.  If you could verify that the fix you mentioned that is
in the 2.0.1 refresh patch also made it into 3.0b, I'd appreciate it.
If you have a later beta version you could send me, that would be great,
too.

Regards,
 Robert.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP4ijvzvnpzTd9fxAQIqeggAkJ4OQ3GrkpqyJUbHImgqbob6npINOv5L
lBUANcHZZ8DMFIq5hP4H+OYX2s/yoS3AKDGf0x8kHoVsTDFTFNe69bsGzJMT3znP
YDmq3ETN4aSGOgKX2NFzWs+mYG0pEN9uDt/SmEYmccYiIuK3lTlb8jxON6mqqJFL
nfitAp7WaLn7OS8A3CfVrAbWwYJ4U6UWPD/rB5sJTg8nTxECc94JaOhPZ90smB6H
9xk8OihEoTxodFLzcpaz/ORS4EPAle69Uw2tP3myjr/4w/SzLGJT6DFVpGQ0BaWC
jVXFYVKyVW4JmFMcW1X29ogmVNH8gEDBUfbG1P5Wd8sLzMMB18tINA==
=X/q7
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general