Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
Robert Walsh wrote, [EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12 4730: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 | iters=1 | duplex=0 | cma=0 | 4730: Local address: LID 0x03, QPN 0x001d, PSN 0x9e070c RKey 0x2302400 VAddr 0x2a95dd3480 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey 0x2402500 VAddr 0x2a95c85480 4730:main: Completion with error at client: 4730:main: Failed status 9: wr_id 3 4730:main: scnt=7584, ccnt=6584 [EMAIL PROTECTED] bin]$ Hi Woody, Robert Walsh wrote, When RC4 is available, there should be a patch in there that will fix this. Can you let us know if you continue to see problems? Regards, Robert. I installed RC5 and now it just hangs, [EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12 4702: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 | iters=1 | duplex=0 | cma=0 | 4702: Local address: LID 0x03, QPN 0x000d, PSN 0xf1b711 RKey 0x1101200 VAddr 0x2a95dc8480 4702: Remote address: LID 0x04, QPN 0x000d, PSN 0xe62247, RKey 0x1101200 VAddr 0x2a95c7c480 hangs here and have to cntrl-c the test. Intel MPI also fails with, # Barrier [1][rdma_iba.c:260] Intel MPI fatal error: DTO operation completed with error. status=0x8. cookie=0x514ee0 rank 1 in job 4 rkl-13_32779 caused collective abort of all ranks exit status of rank 1: killed by signal 9 woody ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I installed RC5 and now it just hangs, [EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12 4702: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 | iters=1 | duplex=0 | cma=0 | 4702: Local address: LID 0x03, QPN 0x000d, PSN 0xf1b711 RKey 0x1101200 VAddr 0x2a95dc8480 4702: Remote address: LID 0x04, QPN 0x000d, PSN 0xe62247, RKey 0x1101200 VAddr 0x2a95c7c480 hangs here and have to cntrl-c the test. Intel MPI also fails with, # Barrier [1][rdma_iba.c:260] Intel MPI fatal error: DTO operation completed with error. status=0x8. cookie=0x514ee0 rank 1 in job 4 rkl-13_32779 caused collective abort of all ranks exit status of rank 1: killed by signal 9 OK - thanks for the report - I'll look into it. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRQm6fvzvnpzTd9fxAQKmiggAhKyznnhzO3ndlYYJx58cSX8XK/R5WNz0 CVhrKxVtjhq+cYaP6HAC9HmwuhMm18vlHGmw8fvoiwrhYP1h7dxaVgiAt9dX2rRz svPd4rZnfIu+L9oZYmy7XBkfawwQR30IZPSUbfQDU1ag2r44HsnyZ6VpKucuHLfL jUFxryC2lmwAU6GhuTKJ8k7XEEQBL3UoczPfL/PTwpFVYvM8CjMgLjwhIfqH++Hv khciAfsl8HgK5Hd6jj1WCOzMyZmL7GBGrpTsia/hgUGOHkpmEC9wy3dSDZeIqCbI 4cs961Y2TIuciNraaLPbF4mhFFgaLJe4nzxSeTLfcbfxXraSqKbn9Q== =pWln -END PGP SIGNATURE- ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I installed RC5 and now it just hangs, Wow - we can't even get RC5 to build here. What distro are you running? I've tried this on RC4 + a fixed libipathverbs package and it runs OK (although it does take a while, which might explain the hang you were seeing.) But mostly I'm curious how you get RC5 to build at all. We really really really shouldn't be attempting to turn RC's around as fast as RC4 to RC5 went: we basically had about enough time to throw a patch together without being able to do much testing. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRQn2QPzvnpzTd9fxAQJFogf/fJidIu6UVaSTbGMyia66kgYrtrL5lvtr FcmyBI01SbjOUnd9rfejt0y1IeN+1O88wBBJBnQPSi3aRUmCufuGYRWM9T2ZXmw8 PxCLyN44AvyF/B6SUfwr8ygXcAQ2nJPvxfdpnEyFlTxBf5gatDg00YiSRu88NtxR 5DrDsK/8OSpy6j0lRVoB7hJh2cs74NhtXawvvzlmGBI4ZhoTmifNPSmPnXwMHJ7+ a4A+dK1cSqjLFUXDh6WPIM5OHS6bKbQeKQ3J4H+I99uK+5n3fb/9CP+Z/aZ3/JEG Qg9dfgsF4onKNBDsXPoGHjI1iU+FOghLFZCTvYXirkqXPgVsTAVK5A== =hwu5 -END PGP SIGNATURE- ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Woodruff, Robert J wrote: Robert Walsh wrote, [EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12 4730: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 | iters=1 | duplex=0 | cma=0 | 4730: Local address: LID 0x03, QPN 0x001d, PSN 0x9e070c RKey 0x2302400 VAddr 0x2a95dd3480 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey 0x2402500 VAddr 0x2a95c85480 4730:main: Completion with error at client: 4730:main: Failed status 9: wr_id 3 4730:main: scnt=7584, ccnt=6584 [EMAIL PROTECTED] bin]$ Hi Woody, Robert Walsh wrote, When RC4 is available, there should be a patch in there that will fix this. Can you let us know if you continue to see problems? Regards, Robert. I installed RC5 and now it just hangs, [EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12 4702: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 | iters=1 | duplex=0 | cma=0 | 4702: Local address: LID 0x03, QPN 0x000d, PSN 0xf1b711 RKey 0x1101200 VAddr 0x2a95dc8480 4702: Remote address: LID 0x04, QPN 0x000d, PSN 0xe62247, RKey 0x1101200 VAddr 0x2a95c7c480 hangs here and have to cntrl-c the test. Intel MPI also fails with, # Barrier [1][rdma_iba.c:260] Intel MPI fatal error: DTO operation completed with error. status=0x8. cookie=0x514ee0 rank 1 in job 4 rkl-13_32779 caused collective abort of all ranks exit status of rank 1: killed by signal 9 Hi Woody, So, we built everything using RC5 plus the libipathverbs from subversion and we were successfully able to run ib_rdma_bw (with your arguments above) and Intel MPI (a simple MPI hello world program). I'm going to continue testing with the Intel MPI testsuite and some applications ISV applications. I'll keep you informed. Regards, Robert. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRQoATfzvnpzTd9fxAQLUKQf9E1ps9XbbXplMm6+5O/XDdlWF0BQws1SC L/aGygh34fZSkpGmCrfze3HhsaOqasu9gUOsJQ89jX6pKNkv4tJAxSJCr+n+bdG3 21Bqr9gcM0MbzrDvOcUDHqvnmC0THlCf0XhikjKg/FJR1e48BIiAOFUzfi0VvI36 G1ZtD8xZXydOfWq7Z4xvyf9Y3qNPIeSKR2JZGJQoGHjxY4+vcteK0UVHfic1Bgpy 9uql47af6tncN+CazYcwf8xnHegiDr34iEEre5wUz//Qy62j8JNPnxhit0W9lXij zFszTkOHQeibxbFWi9ZRyigTmHanxxRUuznW54NL8NIF30jhnmcksQ== =06gu -END PGP SIGNATURE- ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
Well, it looks like the libipathverbs that went into 1.1 branch was botched. How come? Please note that Mellanox for one is unable to test libipathverbs at all. libipathverbs maintainers, please, try to fix by Sunday. And please, test the changes before you commit them. Quoting r. Robert Walsh [EMAIL PROTECTED]: Subject: Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Woodruff, Robert J wrote: Robert Walsh wrote, [EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12 4730: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 | iters=1 | duplex=0 | cma=0 | 4730: Local address: LID 0x03, QPN 0x001d, PSN 0x9e070c RKey 0x2302400 VAddr 0x2a95dd3480 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey 0x2402500 VAddr 0x2a95c85480 4730:main: Completion with error at client: 4730:main: Failed status 9: wr_id 3 4730:main: scnt=7584, ccnt=6584 [EMAIL PROTECTED] bin]$ Hi Woody, Robert Walsh wrote, When RC4 is available, there should be a patch in there that will fix this. Can you let us know if you continue to see problems? Regards, Robert. I installed RC5 and now it just hangs, [EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12 4702: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 | iters=1 | duplex=0 | cma=0 | 4702: Local address: LID 0x03, QPN 0x000d, PSN 0xf1b711 RKey 0x1101200 VAddr 0x2a95dc8480 4702: Remote address: LID 0x04, QPN 0x000d, PSN 0xe62247, RKey 0x1101200 VAddr 0x2a95c7c480 hangs here and have to cntrl-c the test. Intel MPI also fails with, # Barrier [1][rdma_iba.c:260] Intel MPI fatal error: DTO operation completed with error. status=0x8. cookie=0x514ee0 rank 1 in job 4 rkl-13_32779 caused collective abort of all ranks exit status of rank 1: killed by signal 9 Hi Woody, So, we built everything using RC5 plus the libipathverbs from subversion and we were successfully able to run ib_rdma_bw (with your arguments above) and Intel MPI (a simple MPI hello world program). I'm going to continue testing with the Intel MPI testsuite and some applications ISV applications. I'll keep you informed. Regards, Robert. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRQoATfzvnpzTd9fxAQLUKQf9E1ps9XbbXplMm6+5O/XDdlWF0BQws1SC L/aGygh34fZSkpGmCrfze3HhsaOqasu9gUOsJQ89jX6pKNkv4tJAxSJCr+n+bdG3 21Bqr9gcM0MbzrDvOcUDHqvnmC0THlCf0XhikjKg/FJR1e48BIiAOFUzfi0VvI36 G1ZtD8xZXydOfWq7Z4xvyf9Y3qNPIeSKR2JZGJQoGHjxY4+vcteK0UVHfic1Bgpy 9uql47af6tncN+CazYcwf8xnHegiDr34iEEre5wUz//Qy62j8JNPnxhit0W9lXij zFszTkOHQeibxbFWi9ZRyigTmHanxxRUuznW54NL8NIF30jhnmcksQ== =06gu -END PGP SIGNATURE- ___ openfabrics-ewg mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openfabrics-ewg -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
Quoting r. Robert Walsh [EMAIL PROTECTED]: Subject: Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I installed RC5 and now it just hangs, Wow - we can't even get RC5 to build here. What distro are you running? I've tried this on RC4 + a fixed libipathverbs package and it runs OK (although it does take a while, which might explain the hang you were seeing.) But mostly I'm curious how you get RC5 to build at all. We really really really shouldn't be attempting to turn RC's around as fast as RC4 to RC5 went: we basically had about enough time to throw a patch together without being able to do much testing. Changes are expected to be tested before you commit. This is really maintainer's responsibility, please take it seriously. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
Robert Walsh wrote, [EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12 4730: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 | iters=1 | duplex=0 | cma=0 | 4730: Local address: LID 0x03, QPN 0x001d, PSN 0x9e070c RKey 0x2302400 VAddr 0x2a95dd3480 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey 0x2402500 VAddr 0x2a95c85480 4730:main: Completion with error at client: 4730:main: Failed status 9: wr_id 3 4730:main: scnt=7584, ccnt=6584 [EMAIL PROTECTED] bin]$ Hi Woody, Robert Walsh wrote, When RC4 is available, there should be a patch in there that will fix this. Can you let us know if you continue to see problems? Regards, Robert. I installed RC4 and now get this, [EMAIL PROTECTED] bin]$ ./ib_rdma_bw 9035: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000 | duplex=0 | cma=0 | libibverbs: Warning: no userspace device-specific driver found for uverbs0 driver search path: /usr/local/ofed/lib64/infiniband 9035:main: No IB devices found I tried getting the latest ofed 1.1 ipathverbs from svn today that I thought would have a fix for this, and I think I got it built ok, although the mellanox build environment is less than intuitive, but it still seems to fail. Guess we will try again with RC5 tomorrow. woody ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
Quoting r. Woodruff, Robert J [EMAIL PROTECTED]: I tried getting the latest ofed 1.1 ipathverbs from svn today that I thought would have a fix for this, and I think I got it built ok, although the mellanox build environment is less than intuitive, but it still seems to fail. Guess we will try again with RC5 tomorrow. It's actually OFED build environment now :) So you really should report improvement suggestions on list. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Woodruff, Robert J wrote: Robert Walsh wrote, I'll give it a spin this afternoon: it looks quite a bit more comprehensive than the small patch I did. I also just tried running the ib_rdma_bw test and it seems to be flaky if you stress it. If you just run the defaults, it seems to work, but if you crank up the iterations and the message size, it sometimes fails with. [EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12 4730: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 | iters=1 | duplex=0 | cma=0 | 4730: Local address: LID 0x03, QPN 0x001d, PSN 0x9e070c RKey 0x2302400 VAddr 0x2a95dd3480 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey 0x2402500 VAddr 0x2a95c85480 4730:main: Completion with error at client: 4730:main: Failed status 9: wr_id 3 4730:main: scnt=7584, ccnt=6584 [EMAIL PROTECTED] bin]$ Hi Woody, When RC4 is available, there should be a patch in there that will fix this. Can you let us know if you continue to see problems? Regards, Robert. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRQCzfvzvnpzTd9fxAQLfoAf+JWrBo/pPf/tAvTRFckCqjOn3dpH59mJK n1KuN/M9lsP0UobIOEAMAR3KLvTfFe2czEb7ThMxcKjYgJHiikxuiSomB3pbsRK5 W0qTEqMmS5QYFXfpPlvVof4xxdvWZDDUzzkxG0bve4zBVjeJMUnu/8jVTTBmGbqd nmqfLrIP+N8n876x1RZade3DTz0NEDDYRT5d25asbUVuoiF7ldVtbX5RmK6rRdFZ 1ym6fIyHT+fTZ5wnVoTJRdjV8icrR9JpPj/BFL6OoxDQvgMksplDnJaTGc4XinFl WdwZV2NfImYvwSB4QUgqe4Me/BS1xl4gj+OpaviE2TzP7U6tqQVaHQ== =OLHZ -END PGP SIGNATURE- ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
Robert Walsh wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Here is a slightly modified patch for your attributes issue. Can you give it a try? I rebuilt OFED from scratch with the patch, and ran successfully on Intel MPI 2.0.1 with the refresh patch. I could not get it to run on Intel MPI 3.0b. If you could verify that the fix you mentioned that is in the 2.0.1 refresh patch also made it into 3.0b, I'd appreciate it. If you have a later beta version you could send me, that would be great, too. Regards, Robert. I added this patch under fixes to OFED 1.1. Will be in RC4 Tziporet ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tziporet Koren wrote: Robert Walsh wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Here is a slightly modified patch for your attributes issue. Can you give it a try? I rebuilt OFED from scratch with the patch, and ran successfully on Intel MPI 2.0.1 with the refresh patch. I could not get it to run on Intel MPI 3.0b. If you could verify that the fix you mentioned that is in the 2.0.1 refresh patch also made it into 3.0b, I'd appreciate it. If you have a later beta version you could send me, that would be great, too. Regards, Robert. I added this patch under fixes to OFED 1.1. Will be in RC4 Excellent. Thanks, Tziporet. Regards, Robert. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRP8V4fzvnpzTd9fxAQLZVAf+IYtLA2c7cBCbzih2Suy4AHUdD1CghC0U XL+iWjLo4TFbcUhBIrzwG4M72VQanqhNr2Qs3ZtfU2+qN6qKnSZXdejd7nYYOAsz 5LnrWa6Y+9Jfy3K/JOQ4wpjc3lWs3rvuzPTBhmEPcNHZk5+/m0gbfzYLdrc2djPp soyFSQpyLdpF0J5iY12EWiPYnFK7ConoqYHkTODZV8IjBJIImvDoScouIC+Uzi+x HlANIlneKa4/zQHNaK+3vZ6N7ZUq30quMZU6ICMI2gzFEzsEe/HxbtnraXfnXH1J NQ4mMOJNXwPVveNn1E9zA7IgFTMYsnGH080O5saloj2S6P6jb3PLXw== =mDD0 -END PGP SIGNATURE- ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
Robert Walsh wrote, I rebuilt OFED from scratch with the patch, and ran successfully on Intel MPI 2.0.1 with the refresh patch. I could not get it to run on Intel MPI 3.0b. If you could verify that the fix you mentioned that is in the 2.0.1 refresh patch also made it into 3.0b, I'd appreciate it. If you have a later beta version you could send me, that would be great, too. Regards, Robert. I spoke with our MPI team lead and it is very likely that the fix that is in 2.0.1-refresh did not make it into 3.0 beta, but it should be in the 3.0 release schedule to be completed in a couple of weeks. woody ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I spoke with our MPI team lead and it is very likely that the fix that is in 2.0.1-refresh did not make it into 3.0 beta, but it should be in the 3.0 release schedule to be completed in a couple of weeks. OK then - I'll wait for that. Regards, Robert. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRP9kyfzvnpzTd9fxAQJu/wf+PEjyS1xAKzmXD+oZJxUNNeaW7QpqKz3h zc370m74yIWjI+8GianGN4VM6Zx4InPdsRbGNGTd+FRhmZvYDhuuo8VBQUDdAZdB Tkm+PomDIWdftj8cWCsiah4UkhzRv//83TiIkGZ5+zk25qOvQ6VAW4fy6vpJhKvo uTW9Sow/G/BAIuMZ8wwg5Jyz5kbYxDxr+21jzQ+nblM/6YdGVco3GI1/z/dXwK5V JEPIEu4ZxExOU9yGqS/hculq2Z9WFyGTBYoll67KkhpOuLUxiCxCxStA8Z0x52fG OIhL0vKYgiOWLZnxZONRsy89OR/mUV7SNZeOZVqJSqMh7SpeLWWYHQ== =SRiy -END PGP SIGNATURE- ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
Robert, Here is a slightly modified patch for your attributes issue. Can you give it a try? Signed-off by: Arlin Davis [EMAIL PROTECTED] Index: dapl/openib/dapl_ib_util.c === --- dapl/openib/dapl_ib_util.c (revision 9106) +++ dapl/openib/dapl_ib_util.c (working copy) @@ -446,6 +446,7 @@ return(dapl_convert_errno(errno,ib_query_hca)); if (ia_attr != NULL) { + (void) dapl_os_memzero(ia_attr, sizeof(*ia_attr)); ia_attr-adapter_name[DAT_NAME_MAX_LENGTH - 1] = '\0'; ia_attr-vendor_name[DAT_NAME_MAX_LENGTH - 1] = '\0'; ia_attr-ia_address_ptr = @@ -470,7 +471,12 @@ /* ia_attr-hardware_version_minor = dev_attr.fw_ver; */ ia_attr-max_eps = dev_attr.max_qp; ia_attr-max_dto_per_ep = dev_attr.max_qp_wr; - ia_attr-max_rdma_read_per_ep = dev_attr.max_qp_rd_atom; + ia_attr-max_rdma_read_in = dev_attr.max_qp_rd_atom; + ia_attr-max_rdma_read_out= dev_attr.max_qp_rd_atom; + ia_attr-max_rdma_read_per_ep_in = dev_attr.max_qp_rd_atom; + ia_attr-max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom; + ia_attr-max_rdma_read_per_ep_in_guaranteed = DAT_TRUE; + ia_attr-max_rdma_read_per_ep_out_guaranteed = DAT_TRUE; ia_attr-max_evds = dev_attr.max_cq; ia_attr-max_evd_qlen = dev_attr.max_cqe; ia_attr-max_iov_segments_per_dto = dev_attr.max_sge; @@ -501,6 +507,7 @@ } if (ep_attr != NULL) { + (void) dapl_os_memzero(ep_attr, sizeof(*ep_attr)); ep_attr-max_mtu_size = port_attr.max_msg_sz; ep_attr-max_rdma_size= port_attr.max_msg_sz; ep_attr-max_recv_dtos= dev_attr.max_qp_wr; Index: dapl/openib_cma/dapl_ib_util.c === --- dapl/openib_cma/dapl_ib_util.c (revision 9106) +++ dapl/openib_cma/dapl_ib_util.c (working copy) @@ -424,6 +424,7 @@ return(dapl_convert_errno(errno,ib_query_hca)); if (ia_attr != NULL) { + (void) dapl_os_memzero(ia_attr, sizeof(*ia_attr)); ia_attr-adapter_name[DAT_NAME_MAX_LENGTH - 1] = '\0'; ia_attr-vendor_name[DAT_NAME_MAX_LENGTH - 1] = '\0'; ia_attr-ia_address_ptr = @@ -446,6 +447,8 @@ ia_attr-hardware_version_major = dev_attr.hw_ver; ia_attr-max_eps = dev_attr.max_qp; ia_attr-max_dto_per_ep = dev_attr.max_qp_wr; + ia_attr-max_rdma_read_in = dev_attr.max_qp_rd_atom; + ia_attr-max_rdma_read_out= dev_attr.max_qp_rd_atom; ia_attr-max_rdma_read_per_ep_in = dev_attr.max_qp_rd_atom; ia_attr-max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom; ia_attr-max_rdma_read_per_ep_in_guaranteed = DAT_TRUE; @@ -481,6 +484,7 @@ } if (ep_attr != NULL) { + (void) dapl_os_memzero(ep_attr, sizeof(*ep_attr)); ep_attr-max_mtu_size = port_attr.max_msg_sz; ep_attr-max_rdma_size= port_attr.max_msg_sz; ep_attr-max_recv_dtos= dev_attr.max_qp_wr; Index: dapl/openib_scm/dapl_ib_util.c === --- dapl/openib_scm/dapl_ib_util.c (revision 9106) +++ dapl/openib_scm/dapl_ib_util.c (working copy) @@ -373,6 +373,7 @@ return(dapl_convert_errno(errno,ib_query_hca)); if (ia_attr != NULL) { + (void) dapl_os_memzero(ia_attr, sizeof(*ia_attr)); ia_attr-adapter_name[DAT_NAME_MAX_LENGTH - 1] = '\0'; ia_attr-vendor_name[DAT_NAME_MAX_LENGTH - 1] = '\0'; ia_attr-ia_address_ptr = (DAT_IA_ADDRESS_PTR)hca_ptr-hca_address; @@ -390,7 +391,12 @@ /* ia_attr-hardware_version_minor = dev_attr.fw_ver; */ ia_attr-max_eps = dev_attr.max_qp; ia_attr-max_dto_per_ep = dev_attr.max_qp_wr; - ia_attr-max_rdma_read_per_ep = dev_attr.max_qp_rd_atom; + ia_attr-max_rdma_read_in = dev_attr.max_qp_rd_atom; + ia_attr-max_rdma_read_out= dev_attr.max_qp_rd_atom; + ia_attr-max_rdma_read_per_ep_in = dev_attr.max_qp_rd_atom; + ia_attr-max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom; + ia_attr-max_rdma_read_per_ep_in_guaranteed = DAT_TRUE; + ia_attr-max_rdma_read_per_ep_out_guaranteed = DAT_TRUE; ia_attr-max_evds = dev_attr.max_cq; ia_attr-max_evd_qlen
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Arlin Davis wrote: Robert, Here is a slightly modified patch for your attributes issue. Can you give it a try? I'll give it a spin this afternoon: it looks quite a bit more comprehensive than the small patch I did. Regards, Robert. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRP3sXfzvnpzTd9fxAQLwwAf+IOIsC+gqb9Juzt8rwJJlnSW1PjZFrRGi NrCnRXvn52tsgclNNHGSzqOgkIntZ2TqxwEJJeTou3UhUQ5laJWEkQgwrvFTazcn +IQH3BGDLFyZJJQO0WSi2685dEKOH5by6Zp9yVo9sy3Odu6jod2v/uCOjdGkR8ys CvQW+y70qDmom1SJ9P2XQ4/dxxX/v2IFYOWMoVzMlDZsNnvnti/Uspwc1KpQeP6F RRwWImlDyuuAW6+JX6atM5Lne797T5IO7MugW6d/+0oAMVU7H3oiDBdX+9tVwBci IBJJ/PdQ8e7a7x4uOg+LKOSDH16IFVNaua4XhBfVmQEjf1y41KepDw== =1zt8 -END PGP SIGNATURE- ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
Robert Walsh wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Arlin Davis wrote: Robert, Here is a slightly modified patch for your attributes issue. Can you give it a try? I'll give it a spin this afternoon: it looks quite a bit more comprehensive than the small patch I did. Regards, Robert. Just added all appropriate RDMA in/out fields and some code to zero out the structure to avoid uninitialized data fields. -arlin ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Just added all appropriate RDMA in/out fields and some code to zero out the structure to avoid uninitialized data fields. Yup. By comprehensive, I meant better :-) -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRP32hfzvnpzTd9fxAQJnMwgAgcyxQpxdbk/eLEECXTnAOAYjyv3seTpE Ir1s+K7JEYL2Rbyk9h9CzbK67YSYe4QeIE52pTopEVFw8mnSLaz+ZIOmvdRUiHSS FiwEyfbXEPrFKZfyXu/REsigWx5vn7vCZid3hUIdx1vbt9eVAiVPGbAO1ALI8en9 /xc7iTGpYxwBwNOYbdhW0cOCjvobV98Fp6UJebvxd9xiRUS6c2JeZKLYdQyRO5rm JV7L8HqJr1dS8nbAiPG7DSjCv7/3SFdQVr+Tgt5MQpVfD56z41eBBuXzEfeqsg5E HHSxUOTdqizpscMyLudAWGAr5DZwOAQ4Z90zAL8gc2YYbjbOT3D6bA== =JKRU -END PGP SIGNATURE- ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Arlin Davis wrote: Robert, Here is a slightly modified patch for your attributes issue. Can you give it a try? Oddly enough, I'm back to the same problem with your new patch as I saw with the unpatched version: $ mpiexec -n 2 ./a.out I_MPI: [1] MPIDI_CH3I_RDMA_init(): will use DAPL provider from registry: OpenIB-cma I_MPI: [0] MPIDI_CH3I_RDMA_init(): will use DAPL provider from registry: OpenIB-cma I_MPI: [0] MPIDI_CH3_Init(): I_MPI: [1] MPIDI_CH3_Init(): will use rdma configuration will use rdma configuration [1:ib-idev-06][rdma_iba_init_d.c:154] error(0x60029): OpenIB-cma: could not create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6) Hello world: rank 0 of 2 running on ib-idev-05 rank 1 in job 1 ib-idev-05_51891 caused collective abort of all ranks exit status of rank 1: killed by signal 9 Still tracking this one down. I noticed in the patch you removed a couple of lines, too: - ia_attr-max_rdma_read_per_ep = dev_attr.max_qp_rd_atom; Any particular reason why you did this? Regards, Robert. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRP37QvzvnpzTd9fxAQI79wf6Anc3/Ve7tg3x31hE4i5qa9bB01qEYmEv 9xx4FQqXNbhMos9hHEQAWJ9S0sKccr+yCNekkIX6GzlaVDv+AKDzZF6uzA8Prrhr CEcf28c1Pw7gflg8MMfVcnAHr2YG/hXyd+ve9m6cGv0rxgPqY6lWmHjghKDxKO7h f/SaDOaVAuN6kEJMRgIrKIxDyFSVl4z1tGXAK3yHVhslvPqNqGwDqNfFMV6UQK+V NNfKVVKVCttUWdzcVELzi3zkiat5xDdqIcwQr8xs2YaXHfAGeD4NurWowil887Sn bRuh5soVdBaKW9mAtQWuAECt9VLDvyYReLWkEq6ikgilPGCeJluDEw== =TNaE -END PGP SIGNATURE- ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
Oddly enough, I'm back to the same problem with your new patch as I saw with the unpatched version: Hmmm. We ran this with OFED 1.1 RC3 and MPI 3.0b on an EM64T server with your adapter and it worked. Did you ever pick up the Intel MPI 3.0 beta? $ mpiexec -n 2 ./a.out I_MPI: [1] MPIDI_CH3I_RDMA_init(): will use DAPL provider from registry: OpenIB-cma I_MPI: [0] MPIDI_CH3I_RDMA_init(): will use DAPL provider from registry: OpenIB-cma I_MPI: [0] MPIDI_CH3_Init(): I_MPI: [1] MPIDI_CH3_Init(): will use rdma configuration will use rdma configuration [1:ib-idev-06][rdma_iba_init_d.c:154] error(0x60029): OpenIB-cma: could not create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6) Hello world: rank 0 of 2 running on ib-idev-05 rank 1 in job 1 ib-idev-05_51891 caused collective abort of all ranks exit status of rank 1: killed by signal 9 Still tracking this one down. I noticed in the patch you removed a couple of lines, too: - ia_attr-max_rdma_read_per_ep = dev_attr.max_qp_rd_atom; Any particular reason why you did this? max_rdma_read_per_ep is the same as max_rdma_read_per_ep_in. Look at dat.h line #369 /* To support backwards compatibility for DAPL-1.0 */ #define max_rdma_read_per_epmax_rdma_read_per_ep_in #define DAT_IA_FIELD_IA_MAX_DTO_PER_OP DAT_IA_FIELD_IA_MAX_DTO_PER_EP_IN /* To support backwards compatibility for DAPL-1.0 DAPL-1.1 */ #define max_mtu_size max_message_size -arlin ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Oddly enough, I'm back to the same problem with your new patch as I saw with the unpatched version: Hmmm. We ran this with OFED 1.1 RC3 and MPI 3.0b on an EM64T server with your adapter and it worked. Weird - it's not working for me at all. Maybe I'm messing up somewhere. I've got a meeting for the next hour or so - I'll check again when I get back. Did you ever pick up the Intel MPI 3.0 beta? Yup. max_rdma_read_per_ep is the same as max_rdma_read_per_ep_in. Ah - fair enough. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRP4DLPzvnpzTd9fxAQJ3nwgAiO+dLDRQv22RrBHYqHcodDwC2ZakxzFh pXBn9j5kwzA2EmnXCvex14v7K168Alqr9lgUpfaGr6StZsCdBU0FY2TRjok41VFl h+fYu78QFgDjleTMkp17Hl7RG9/r8AWzKzTG1LDn1YqwHrn9ngeZlqFfy1BP1tfB pkkW+Nj7HQXbXUNiDc/V9HKW7eBOjwCvkfDI7Knbrfp2QVBI/9ABpWGO4bJf3P7X n9ZzlEBN0SCOHKtGAa1gspQrmJGMHw0qyajUA6Yuyp1dWRygbl8L+ahF2BJFwZSx KGyhoBRZexpP8m0AJASnKgAVjGf6JR31dL7O8WAOjD4QpFEofMSqqA== =yDmH -END PGP SIGNATURE- ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Woodruff, Robert J wrote: Robert Walsh wrote, I'll give it a spin this afternoon: it looks quite a bit more comprehensive than the small patch I did. I also just tried running the ib_rdma_bw test and it seems to be flaky if you stress it. If you just run the defaults, it seems to work, but if you crank up the iterations and the message size, it sometimes fails with. [EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12 4730: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 | iters=1 | duplex=0 | cma=0 | 4730: Local address: LID 0x03, QPN 0x001d, PSN 0x9e070c RKey 0x2302400 VAddr 0x2a95dd3480 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey 0x2402500 VAddr 0x2a95c85480 4730:main: Completion with error at client: 4730:main: Failed status 9: wr_id 3 4730:main: scnt=7584, ccnt=6584 [EMAIL PROTECTED] bin]$ This looks like a known bug, the fix to which didn't make it into OFED 1.1-RC3. Hopefully we can still get this into 1.1-RC4. Regards, Robert. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRP4aOfzvnpzTd9fxAQKAEggAlZC5hYi9kdxLkj9Mfl/BwHJQxWUwsKcG K2ck3jtrP6PVa04FdVI/TNL2XE7R3eu69vTfBaTS26pw2CVM6av0ztFiWEV2r5Fu 8FXGJBOuDOYxnwuA0o3yHSMVFtrRW6Jgn2G/JQPZ8IDAK7GrPj3VebvyclPwF5+d KMPIFXJaTzjoJl2JEGFLiSlf+tFMOEs3vazrRwkZpQezKRcs3F1E6TQImtN7kuYK 0/IKxeS4ZOduXpczsJZgsPs6Y9kYi94XN0E4JeJJAh9Miq+bXkxhxbrafieNl7xW n9m7i/phcFcngSzDwjBNXE2ZuQjujDpz94SRnkVedomYNbr5zKXBgQ== =NurT -END PGP SIGNATURE- ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Here is a slightly modified patch for your attributes issue. Can you give it a try? I rebuilt OFED from scratch with the patch, and ran successfully on Intel MPI 2.0.1 with the refresh patch. I could not get it to run on Intel MPI 3.0b. If you could verify that the fix you mentioned that is in the 2.0.1 refresh patch also made it into 3.0b, I'd appreciate it. If you have a later beta version you could send me, that would be great, too. Regards, Robert. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iQEVAwUBRP4ijvzvnpzTd9fxAQIqeggAkJ4OQ3GrkpqyJUbHImgqbob6npINOv5L lBUANcHZZ8DMFIq5hP4H+OYX2s/yoS3AKDGf0x8kHoVsTDFTFNe69bsGzJMT3znP YDmq3ETN4aSGOgKX2NFzWs+mYG0pEN9uDt/SmEYmccYiIuK3lTlb8jxON6mqqJFL nfitAp7WaLn7OS8A3CfVrAbWwYJ4U6UWPD/rB5sJTg8nTxECc94JaOhPZ90smB6H 9xk8OihEoTxodFLzcpaz/ORS4EPAle69Uw2tP3myjr/4w/SzLGJT6DFVpGQ0BaWC jVXFYVKyVW4JmFMcW1X29ogmVNH8gEDBUfbG1P5Wd8sLzMMB18tINA== =X/q7 -END PGP SIGNATURE- ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general