On 3 August 2012 06:12, David Dillow <[email protected]> wrote:
> On Thu, 2012-08-02 at 11:04 +0000, Bart Van Assche wrote:
>> On 07/24/12 15:43, Joseph Glanville wrote:
>> > [35404.804723] BUG: unable to handle kernel NULL pointer dereference at 
>> > (null)
>>
>> I've been able to reproduce this ib_srp crash. Apparently if an SRP
>> response is received after srp_reset_host() has been invoked
>> srp_process_rsp() tries to call scmnd->scsi_done(scmnd) with scsi_done
>> == NULL, hence the kernel oops. A candidate fix is available in this
>> (rebased) tree: http://github.com/bvanassche/linux/tree/srp-ha.
>
> Hmm, I stopped looking at the thread when I noted the same points Roland
> did -- it looked like it was in the target rather than the initiator,
> and that ib_srp wasn't loaded (though it could have been built-in).
>
> I think I'm good with your fix, given a few minor changes:
>       * rebase it to mainline (I tried it quickly, got conflicts that
>         should be simple to resolve)
>       * s/srp_remove_req/srp_claim_req/ as it doesn't remove the
>         request. This isn't an issue you introduced; it should probably
>         have been renamed some time ago.
>       * in srp_remove_req(), the test for (scmnd && req->scmnd == scmnd)
>         should probably be marked likely()
>       * Similarly, the !scmnd test in srp_process_rsp() should be
>         unlikely()
>       * The reclamation of credits should be moved to srp_free_req(),
>         since we could see the case where a credit is available without
>         a corresponding request structure.
>       * Get rid of the BUG_ON in srp_process_rsp(); in the past, I would
>         have probably added it myself, but Andrew Morton called me on
>         one I had tried to add, and he was right -- it doesn't add
>         anything.
>       * I wonder if srp_free_req() is the right name, but I think I'm
>         deep in bike-shedding territory here.
>
> It'd be nice if we could avoid taking the lock twice in quick succession
> during normal operations, but that's something for later.
>
> We should get this into 3.6, and send it to stable as well. I can make
> the changes if you'd like, just let me know.
>
> Thanks,
>
> --
> Dave Dillow
> National Center for Computational Science
> Oak Ridge National Laboratory
> (865) 241-6602 office
>
>

Hi Bart/David.

I have had this deployed for a few hours now and haven't been able to
trigger the crash again.
Disconnecting targets works correctly as does removing luns.

Thanks. :)

Joseph.

-- 
CTO | Orion Virtualisation Solutions | www.orionvm.com.au
Phone: 1300 56 99 52 | Mobile: 0428 754 846
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to