On 3 August 2012 06:12, David Dillow <[email protected]> wrote: > On Thu, 2012-08-02 at 11:04 +0000, Bart Van Assche wrote: >> On 07/24/12 15:43, Joseph Glanville wrote: >> > [35404.804723] BUG: unable to handle kernel NULL pointer dereference at >> > (null) >> >> I've been able to reproduce this ib_srp crash. Apparently if an SRP >> response is received after srp_reset_host() has been invoked >> srp_process_rsp() tries to call scmnd->scsi_done(scmnd) with scsi_done >> == NULL, hence the kernel oops. A candidate fix is available in this >> (rebased) tree: http://github.com/bvanassche/linux/tree/srp-ha. > > Hmm, I stopped looking at the thread when I noted the same points Roland > did -- it looked like it was in the target rather than the initiator, > and that ib_srp wasn't loaded (though it could have been built-in). > > I think I'm good with your fix, given a few minor changes: > * rebase it to mainline (I tried it quickly, got conflicts that > should be simple to resolve) > * s/srp_remove_req/srp_claim_req/ as it doesn't remove the > request. This isn't an issue you introduced; it should probably > have been renamed some time ago. > * in srp_remove_req(), the test for (scmnd && req->scmnd == scmnd) > should probably be marked likely() > * Similarly, the !scmnd test in srp_process_rsp() should be > unlikely() > * The reclamation of credits should be moved to srp_free_req(), > since we could see the case where a credit is available without > a corresponding request structure. > * Get rid of the BUG_ON in srp_process_rsp(); in the past, I would > have probably added it myself, but Andrew Morton called me on > one I had tried to add, and he was right -- it doesn't add > anything. > * I wonder if srp_free_req() is the right name, but I think I'm > deep in bike-shedding territory here. > > It'd be nice if we could avoid taking the lock twice in quick succession > during normal operations, but that's something for later. > > We should get this into 3.6, and send it to stable as well. I can make > the changes if you'd like, just let me know. > > Thanks, > > -- > Dave Dillow > National Center for Computational Science > Oak Ridge National Laboratory > (865) 241-6602 office > >
Hi Bart/David. I have had this deployed for a few hours now and haven't been able to trigger the crash again. Disconnecting targets works correctly as does removing luns. Thanks. :) Joseph. -- CTO | Orion Virtualisation Solutions | www.orionvm.com.au Phone: 1300 56 99 52 | Mobile: 0428 754 846 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
