On Mon, Sep 21, 2009 at 10:59 AM, Vladislav Bolkhovitin <v...@vlnb.net> wrote: > Chris Worley, on 09/19/2009 01:31 AM wrote: >> >> On Mon, Sep 7, 2009 at 5:58 AM, Vladislav Bolkhovitin <v...@vlnb.net> >> wrote: >>> >>> Chris Worley, on 09/06/2009 05:41 PM wrote: >>>> >>>> On Sun, Sep 6, 2009 at 3:36 PM, Chris Worley<worl...@gmail.com> wrote: >>>>> >>>>> On Sun, Sep 6, 2009 at 3:17 PM, Bart Van >>>>> Assche<bart.vanass...@gmail.com> >>>>> wrote: >>>>>> >>>>>> On Fri, Sep 4, 2009 at 1:20 AM, Chris Worley <worl...@gmail.com> >>>>>> wrote: >>>>>>> >>>>>>> On Thu, Sep 3, 2009 at 11:38 AM, Chris Worley<worl...@gmail.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> I've used a couple of initiators (different systems) w/ different >>>>>>>> OSes, w/ different IB cards (all QDR) and different IB stacks >>>>>>>> (built-in vs. OFED) and can repeat the problem in all but the >>>>>>>> RHEL5.2/OFED 1.4.1 target and initiator (but, if the initiator is >>>>>>>> WinOF and the target is RHEL5.2/OFED1.4.1, then the problem does >>>>>>>> repeat). >>>>>>> >>>>>>> Here's a twist: I used the Ubuntu initiator w/ one of the RHEL >>>>>>> targets, and the RHEL initiator (same machine as was running WinOF >>>>>>> from the beginning of this thread) w/ one of the Ubuntu targets: in >>>>>>> both cases, the problem does not repeat. >>>>>>> >>>>>>> That makes it sound like OFED is the cure on either side of the >>>>>>> connection, but does not explain the issue w/ WinOF (which does fail >>>>>>> w/ either Ununtu or RHEL targets). >>>>>> >>>>>> These results are strange. Regarding the Linux-only tests, I was >>>>>> assuming failure of a single component (Ubuntu SRP initiator, OFED SRP >>>>>> initiator, Ubuntu IB driver, OFED IB driver or SRP target), but for >>>>>> each of these components there is at least one test that passes and at >>>>>> least one test that fails. So either my assumption is wrong or one of >>>>>> the above test results is not repeatable. Do you have the time to >>>>>> repeat the Linux-only tests ? >>>>> >>>>> Last night I was rerunning the RHEL5.2 initiator w/ Ubuntu client, and >>>>> the problem repeated; now, I can't repeat the case where it didn't >>>>> fail. Still, no errors, other than the eventual timeouts previously >>>>> shown; the target thinks all is fine, the initiator is stuck. >>>> >>>> ... and I haven't had any success w/ Ubuntu target and initiator, 8.10 >>>> or >>>> 9.04. >>> >>> 1. Try with kernel parameter maxcpus=1. It will somehow relax possible >>> races >>> you have, although not completely. >> >> I finally got around to this test... 1 CPU works very well, w/o hangs >> (will test all night to see if this holds true), 2 or more don't. >> This is dual-socket NHM, so I can't specify more than one processor >> w/o getting more than one socket. > > Where 1 CPU works well, on the target or initiator?
That was on the target. > The race is on the > corresponding host. > > I'd suggest you to reproduce the problem with the latest SCST trunk, lockdep > enabled on the suspected host (better on both) and mgmt_minor trace level > enabled on the target. Then, after the hang, let the system stay for about a > half an hour, then send us with Bart (privately, compressed) kernel logs > from both systems starting from the early boot messages. I believe I comprehensively tested w/ Lockdep and complete scst messages dumps on the target (and lockdep on the initiator) and came up with no messages or lock issues salient to the issue. If you think I should repeat this, I will. > > If you have dmesg only output, please enable printk timestamps > (CONFIG_PRINTK_TIME). Ubuntu has been pretty good about that. Thanks, Chris > >> Chris >>> >>> 2. Try with another hardware, including motherboard. You can have >>> something >>> like http://lkml.org/lkml/2007/7/31/558 (not exactly it, of course) >>> >>>> Chris >>>>> >>>>> Chris >>>>>> >>>>>> Bart. >>>>>> > > _______________________________________________ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general