Re: 4.10+ qla2xxx driver wont load for qla2xxx (ISP2532-based 8Gb) with BAR 3 error, work fine on 4.9

2017-03-14 Thread Laurence Oberman


- Original Message -
> From: "Himanshu Madhani" 
> To: "Laurence Oberman" 
> Cc: "Chad Dupuis" , "Linux SCSI List" 
> 
> Sent: Tuesday, March 14, 2017 8:32:25 PM
> Subject: Re: 4.10+ qla2xxx  driver wont load for qla2xxx (ISP2532-based 8Gb) 
> with BAR 3 error, work fine on 4.9
> 
> 
> - Original Message -
> > From: "Laurence Oberman" 
> > To: "Himanshu Madhani" 
> > Cc: "Chad Dupuis" , "Linux SCSI List"
> > 
> > Sent: Tuesday, March 14, 2017 8:02:32 PM
> > Subject: Re: 4.10+ qla2xxx  driver wont load for qla2xxx (ISP2532-based
> > 8Gb) with BAR 3 error, work fine on 4.9
> > 
> > 
> > 
> > - Original Message -
> > > From: "Himanshu Madhani" 
> > > To: "Laurence Oberman" 
> > > Cc: "Chad Dupuis" , "Linux SCSI List"
> > > 
> > > Sent: Tuesday, March 14, 2017 5:11:13 PM
> > > Subject: Re: 4.10+ qla2xxx  driver wont load for qla2xxx
> > > (ISP2532-based
> > > 8Gb) with BAR 3 error, work fine on 4.9
> > > 
> > > 
> > > - Original Message -
> > > > From: "Laurence Oberman" 
> > > > To: "Himanshu Madhani" 
> > > > Cc: "Chad Dupuis" , "Linux SCSI List"
> > > > 
> > > > Sent: Monday, March 13, 2017 9:06:38 PM
> > > > Subject: Re: 4.10+ qla2xxx  driver wont load for qla2xxx
> > > > (ISP2532-based
> > > > 8Gb) with BAR 3 error, work fine on 4.9
> > > > 
> > > > 
> > > > 
> > > > - Original Message -
> > > > > From: "Laurence Oberman" 
> > > > > To: "Himanshu Madhani" 
> > > > > Cc: "Chad Dupuis" , "Linux SCSI List"
> > > > > 
> > > > > Sent: Monday, March 13, 2017 12:54:12 PM
> > > > > Subject: Re: 4.10+ qla2xxx  driver wont load for qla2xxx
> > > > > (ISP2532-based
> > > > > 8Gb) with BAR 3 error, work fine on 4.9
> > > > > 
> > > > > 
> > > > > 
> > > > > - Original Message -
> > > > > > From: "Himanshu Madhani" 
> > > > > > To: "Laurence Oberman" , "Chad Dupuis"
> > > > > > 
> > > > > > Cc: "Linux SCSI List" 
> > > > > > Sent: Monday, March 13, 2017 12:39:03 PM
> > > > > > Subject: RE: 4.10+ qla2xxx  driver wont load for qla2xxx
> > > > > > (ISP2532-based
> > > > > > 8Gb) with BAR 3 error, work fine on 4.9
> > > > > > 
> > > > > > Hi Laurence,
> > > > > > 
> > > > > > > -Original Message-
> > > > > > > From: Laurence Oberman [mailto:lober...@redhat.com]
> > > > > > > Sent: Sunday, March 12, 2017 11:31 AM
> > > > > > > To: Dupuis, Chad ; Madhani,
> > > > > > > Himanshu
> > > > > > > 
> > > > > > > Cc: Linux SCSI List 
> > > > > > > Subject: Re: 4.10+ qla2xxx driver wont load for qla2xxx
> > > > > > > (ISP2532-based
> > > > > > > 8Gb)
> > > > > > > with BAR 3 error, work fine on 4.9
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > - Original Message -
> > > > > > > > From: "Laurence Oberman" 
> > > > > > > > To: "Chad Dupuis" , "Himanshu
> > > > > > > > Madhani"
> > > > > > > > 
> > > > > > > > Cc: "Linux SCSI List" 
> > > > > > > > Sent: Sunday, March 12, 2017 7:39:23 AM
> > > > > > > > Subject: 4.10+ qla2xxx  driver wont load for qla2xxx
> > > > > > > > (ISP2532-based
> > > > > > > > 8Gb) with BAR 3 error, work fine on 4.9
> > > > > > > >
> > > > > > > > Chad, Himanshu
> > > > > > > >
> > > > > > > > Before I bisect or go chase changes, wanted to reach
> > > > > > > > out
> > > > > > > > because the
> > > > > > > > driver seems to be the same version.
> > > > > > > > Perhaps this is a PCIE change in the kernel for 4.10
> > > > > > > > affecting
> > > > > > > > the
> > > > > > > > load.
> > > > > > > > Its the same targetLIO server I have been using for a
> > > > > > > > long
> > > > > > > > 

Re: 4.10+ qla2xxx driver wont load for qla2xxx (ISP2532-based 8Gb) with BAR 3 error, work fine on 4.9

2017-03-14 Thread Laurence Oberman


- Original Message -
> From: "Laurence Oberman" 
> To: "Himanshu Madhani" 
> Cc: "Chad Dupuis" , "Linux SCSI List" 
> 
> Sent: Tuesday, March 14, 2017 8:02:32 PM
> Subject: Re: 4.10+ qla2xxx  driver wont load for qla2xxx (ISP2532-based 8Gb) 
> with BAR 3 error, work fine on 4.9
> 
> 
> 
> - Original Message -
> > From: "Himanshu Madhani" 
> > To: "Laurence Oberman" 
> > Cc: "Chad Dupuis" , "Linux SCSI List"
> > 
> > Sent: Tuesday, March 14, 2017 5:11:13 PM
> > Subject: Re: 4.10+ qla2xxx  driver wont load for qla2xxx (ISP2532-based
> > 8Gb) with BAR 3 error, work fine on 4.9
> > 
> > 
> > - Original Message -
> > > From: "Laurence Oberman" 
> > > To: "Himanshu Madhani" 
> > > Cc: "Chad Dupuis" , "Linux SCSI List"
> > > 
> > > Sent: Monday, March 13, 2017 9:06:38 PM
> > > Subject: Re: 4.10+ qla2xxx  driver wont load for qla2xxx
> > > (ISP2532-based
> > > 8Gb) with BAR 3 error, work fine on 4.9
> > > 
> > > 
> > > 
> > > - Original Message -
> > > > From: "Laurence Oberman" 
> > > > To: "Himanshu Madhani" 
> > > > Cc: "Chad Dupuis" , "Linux SCSI List"
> > > > 
> > > > Sent: Monday, March 13, 2017 12:54:12 PM
> > > > Subject: Re: 4.10+ qla2xxx  driver wont load for qla2xxx
> > > > (ISP2532-based
> > > > 8Gb) with BAR 3 error, work fine on 4.9
> > > > 
> > > > 
> > > > 
> > > > - Original Message -
> > > > > From: "Himanshu Madhani" 
> > > > > To: "Laurence Oberman" , "Chad Dupuis"
> > > > > 
> > > > > Cc: "Linux SCSI List" 
> > > > > Sent: Monday, March 13, 2017 12:39:03 PM
> > > > > Subject: RE: 4.10+ qla2xxx  driver wont load for qla2xxx
> > > > > (ISP2532-based
> > > > > 8Gb) with BAR 3 error, work fine on 4.9
> > > > > 
> > > > > Hi Laurence,
> > > > > 
> > > > > > -Original Message-
> > > > > > From: Laurence Oberman [mailto:lober...@redhat.com]
> > > > > > Sent: Sunday, March 12, 2017 11:31 AM
> > > > > > To: Dupuis, Chad ; Madhani, Himanshu
> > > > > > 
> > > > > > Cc: Linux SCSI List 
> > > > > > Subject: Re: 4.10+ qla2xxx driver wont load for qla2xxx
> > > > > > (ISP2532-based
> > > > > > 8Gb)
> > > > > > with BAR 3 error, work fine on 4.9
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > - Original Message -
> > > > > > > From: "Laurence Oberman" 
> > > > > > > To: "Chad Dupuis" , "Himanshu
> > > > > > > Madhani"
> > > > > > > 
> > > > > > > Cc: "Linux SCSI List" 
> > > > > > > Sent: Sunday, March 12, 2017 7:39:23 AM
> > > > > > > Subject: 4.10+ qla2xxx  driver wont load for qla2xxx
> > > > > > > (ISP2532-based
> > > > > > > 8Gb) with BAR 3 error, work fine on 4.9
> > > > > > >
> > > > > > > Chad, Himanshu
> > > > > > >
> > > > > > > Before I bisect or go chase changes, wanted to reach out
> > > > > > > because the
> > > > > > > driver seems to be the same version.
> > > > > > > Perhaps this is a PCIE change in the kernel for 4.10
> > > > > > > affecting
> > > > > > > the
> > > > > > > load.
> > > > > > > Its the same targetLIO server I have been using for a long
> > > > > > > time
> > > > > > > with
> > > > > > > 4.9
> > > > > > >
> > > > > > > 27:00.0 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre
> > > > > > > Channel
> > > > > > > to
> > > > > > > PCI Express HBA (rev 02)
> > > > > > >
> > > > > > > With 4.9 I have no issues loading the driver for my targetLIO
> > > > > > > server.
> > > > > > > (DL380G8)
> > > > > > >
> > > > > > > # modinfo qla2xxx | more
> > > > > > > filename:
> > > > > > > 
> > /lib/modules/4.9.0.lobetcm+/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
> > > > > > > firmware:   ql2500_fw.bin
> > > > > > > version:8.07.00.38-k
> > > > > > > license:GPL
> > > > > > > description:QLogic Fibre Channel HBA Driver
> > > > > > > author: QLogic Corporation
> > > > > > > srcversion: 94A8431A85BFF854B97B02D
> > > > > > >
> > > > > > > [8.906351] qla2xxx [:00:00.0]-0005: : QLogic Fibre
> > > > > > > Channel
> > > > > > > HBA
> >

Re: 4.10+ qla2xxx driver wont load for qla2xxx (ISP2532-based 8Gb) with BAR 3 error, work fine on 4.9

2017-03-14 Thread Laurence Oberman


- Original Message -
> From: "Himanshu Madhani" 
> To: "Laurence Oberman" 
> Cc: "Chad Dupuis" , "Linux SCSI List" 
> 
> Sent: Tuesday, March 14, 2017 5:11:13 PM
> Subject: Re: 4.10+ qla2xxx  driver wont load for qla2xxx (ISP2532-based 8Gb) 
> with BAR 3 error, work fine on 4.9
> 
> 
> - Original Message -
> > From: "Laurence Oberman" 
> > To: "Himanshu Madhani" 
> > Cc: "Chad Dupuis" , "Linux SCSI List"
> > 
> > Sent: Monday, March 13, 2017 9:06:38 PM
> > Subject: Re: 4.10+ qla2xxx  driver wont load for qla2xxx (ISP2532-based
> > 8Gb) with BAR 3 error, work fine on 4.9
> > 
> > 
> > 
> > - Original Message -
> > > From: "Laurence Oberman" 
> > > To: "Himanshu Madhani" 
> > > Cc: "Chad Dupuis" , "Linux SCSI List"
> > > 
> > > Sent: Monday, March 13, 2017 12:54:12 PM
> > > Subject: Re: 4.10+ qla2xxx  driver wont load for qla2xxx
> > > (ISP2532-based
> > > 8Gb) with BAR 3 error, work fine on 4.9
> > > 
> > > 
> > > 
> > > - Original Message -
> > > > From: "Himanshu Madhani" 
> > > > To: "Laurence Oberman" , "Chad Dupuis"
> > > > 
> > > > Cc: "Linux SCSI List" 
> > > > Sent: Monday, March 13, 2017 12:39:03 PM
> > > > Subject: RE: 4.10+ qla2xxx  driver wont load for qla2xxx
> > > > (ISP2532-based
> > > > 8Gb) with BAR 3 error, work fine on 4.9
> > > > 
> > > > Hi Laurence,
> > > > 
> > > > > -Original Message-
> > > > > From: Laurence Oberman [mailto:lober...@redhat.com]
> > > > > Sent: Sunday, March 12, 2017 11:31 AM
> > > > > To: Dupuis, Chad ; Madhani, Himanshu
> > > > > 
> > > > > Cc: Linux SCSI List 
> > > > > Subject: Re: 4.10+ qla2xxx driver wont load for qla2xxx
> > > > > (ISP2532-based
> > > > > 8Gb)
> > > > > with BAR 3 error, work fine on 4.9
> > > > > 
> > > > > 
> > > > > 
> > > > > - Original Message -
> > > > > > From: "Laurence Oberman" 
> > > > > > To: "Chad Dupuis" , "Himanshu Madhani"
> > > > > > 
> > > > > > Cc: "Linux SCSI List" 
> > > > > > Sent: Sunday, March 12, 2017 7:39:23 AM
> > > > > > Subject: 4.10+ qla2xxx  driver wont load for qla2xxx
> > > > > > (ISP2532-based
> > > > > > 8Gb) with BAR 3 error, work fine on 4.9
> > > > > >
> > > > > > Chad, Himanshu
> > > > > >
> > > > > > Before I bisect or go chase changes, wanted to reach out
> > > > > > because the
> > > > > > driver seems to be the same version.
> > > > > > Perhaps this is a PCIE change in the kernel for 4.10 affecting
> > > > > > the
> > > > > > load.
> > > > > > Its the same targetLIO server I have been using for a long time
> > > > > > with
> > > > > > 4.9
> > > > > >
> > > > > > 27:00.0 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre
> > > > > > Channel
> > > > > > to
> > > > > > PCI Express HBA (rev 02)
> > > > > >
> > > > > > With 4.9 I have no issues loading the driver for my targetLIO
> > > > > > server.
> > > > > > (DL380G8)
> > > > > >
> > > > > > # modinfo qla2xxx | more
> > > > > > filename:
> > > > > > 
> /lib/modules/4.9.0.lobetcm+/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
> > > > > > firmware:   ql2500_fw.bin
> > > > > > version:8.07.00.38-k
> > > > > > license:GPL
> > > > > > description:QLogic Fibre Channel HBA Driver
> > > > > > author: QLogic Corporation
> > > > > > srcversion: 94A8431A85BFF854B97B02D
> > > > > >
> > > > > > [8.906351] qla2xxx [:00:00.0]-0005: : QLogic Fibre
> > > > > > Channel
> > > > > > HBA
> > > > > > Driver: 8.07.00.38-k.
> > > > > > [   10.014052] qla2xxx [:27:00.0]-001d: : Found an ISP2532
> > > > > > irq
> > > > > > 106
> > > > > > iobase
> > > > > > 0xadce989a1000.
> > > > > > [   10.455108] scsi host4: qla2xxx
> > > > > > [   10.460206] qla2xxx [:27:00.0]-00fb:4: QLogic QLE2562 -
> > > > > > PCI-Express
> > > > > > Dual Channel 8Gb Fibre Channel HBA.
> > > > > > [   10.460215] qla2xxx [:27:00.0]-00fc:4: ISP2532: PCIe
> > > > > > (5.0GT/s
> > > > > > x8)
> > > > > > @
> > > > > > :27:00.0 hdma+ host#=4 fw=8.03.00 (90d5).
> > > > > > [   10.460545] qla2xxx 

[PATCH] aacraid: Fix potential null access

2017-03-14 Thread Raghava Aditya Renukunta
Currently command threads fails to return ioctls commands for older
controller versions, since it returns when all the fibs have been
allocated. Another issue is even all the fibs have not been allocated,
the correct allocated fibs is not updated nor freed.

Fixes: 113156bcea9ef1e6 (scsi: aacraid: Reworked aac_command_thread)
Reported-by: Tomas Henzl 

Signed-off-by: Raghava Aditya Renukunta 
---
 drivers/scsi/aacraid/commsup.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
index a3ad042..c8172f1 100644
--- a/drivers/scsi/aacraid/commsup.c
+++ b/drivers/scsi/aacraid/commsup.c
@@ -2056,7 +2056,6 @@ static int fillup_pools(struct aac_dev *dev, struct 
hw_fib **hw_fib_pool,
 {
struct hw_fib **hw_fib_p;
struct fib **fib_p;
-   int rcode = 1;
 
hw_fib_p = hw_fib_pool;
fib_p = fib_pool;
@@ -2074,11 +2073,11 @@ static int fillup_pools(struct aac_dev *dev, struct 
hw_fib **hw_fib_pool,
}
}
 
+   /*
+* Get the actual number of allocated fibs
+*/
num = hw_fib_p - hw_fib_pool;
-   if (!num)
-   rcode = 0;
-
-   return rcode;
+   return num;
 }
 
 static void wakeup_fibctx_threads(struct aac_dev *dev,
@@ -2186,7 +2185,6 @@ static void aac_process_events(struct aac_dev *dev)
struct fib *fib;
unsigned long flags;
spinlock_t *t_lock;
-   unsigned int rcode;
 
t_lock = dev->queues->queue[HostNormCmdQueue].lock;
spin_lock_irqsave(t_lock, flags);
@@ -2269,8 +2267,8 @@ static void aac_process_events(struct aac_dev *dev)
 * Fill up fib pointer pools with actual fibs
 * and hw_fibs
 */
-   rcode = fillup_pools(dev, hw_fib_pool, fib_pool, num);
-   if (!rcode)
+   num = fillup_pools(dev, hw_fib_pool, fib_pool, num);
+   if (!num)
goto free_mem;
 
/*
-- 
1.8.3.1



[Bug 194837] VM with virtio-scsi drive often crashes during boot with kernel 4.11rc1

2017-03-14 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=194837

Omar Sandoval (osan...@osandov.com) changed:

   What|Removed |Added

 CC||osan...@osandov.com

--- Comment #8 from Omar Sandoval (osan...@osandov.com) ---
This was fixed in v4.11-rc2.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 194837] VM with virtio-scsi drive often crashes during boot with kernel 4.11rc1

2017-03-14 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=194837

--- Comment #7 from Thorsten Leemhuis (regressi...@leemhuis.info) ---
Leaving the keyboard now; I'm down to 1827adb11ad2...590dce2d4934 (includes
e0d072250a54669dce876d8ade70e417356aae74 Merge branch 'for-linus' of
git://git.kernel.dk/linux-block )

-- 
You are receiving this mail because:
You are the assignee for the bug.


Re: 4.10+ qla2xxx driver wont load for qla2xxx (ISP2532-based 8Gb) with BAR 3 error, work fine on 4.9

2017-03-14 Thread Madhani, Himanshu

- Original Message -
> From: "Laurence Oberman" 
> To: "Himanshu Madhani" 
> Cc: "Chad Dupuis" , "Linux SCSI List" 

> Sent: Monday, March 13, 2017 9:06:38 PM
> Subject: Re: 4.10+ qla2xxx  driver wont load for qla2xxx (ISP2532-based 
8Gb) with BAR 3 error, work fine on 4.9
> 
> 
> 
> - Original Message -
> > From: "Laurence Oberman" 
> > To: "Himanshu Madhani" 
> > Cc: "Chad Dupuis" , "Linux SCSI List"
> > 
> > Sent: Monday, March 13, 2017 12:54:12 PM
> > Subject: Re: 4.10+ qla2xxx  driver wont load for qla2xxx (ISP2532-based
> > 8Gb) with BAR 3 error, work fine on 4.9
> > 
> > 
> > 
> > - Original Message -
> > > From: "Himanshu Madhani" 
> > > To: "Laurence Oberman" , "Chad Dupuis"
> > > 
> > > Cc: "Linux SCSI List" 
> > > Sent: Monday, March 13, 2017 12:39:03 PM
> > > Subject: RE: 4.10+ qla2xxx  driver wont load for qla2xxx 
(ISP2532-based
> > > 8Gb) with BAR 3 error, work fine on 4.9
> > > 
> > > Hi Laurence,
> > > 
> > > > -Original Message-
> > > > From: Laurence Oberman [mailto:lober...@redhat.com]
> > > > Sent: Sunday, March 12, 2017 11:31 AM
> > > > To: Dupuis, Chad ; Madhani, Himanshu
> > > > 
> > > > Cc: Linux SCSI List 
> > > > Subject: Re: 4.10+ qla2xxx driver wont load for qla2xxx 
(ISP2532-based
> > > > 8Gb)
> > > > with BAR 3 error, work fine on 4.9
> > > > 
> > > > 
> > > > 
> > > > - Original Message -
> > > > > From: "Laurence Oberman" 
> > > > > To: "Chad Dupuis" , "Himanshu Madhani"
> > > > > 
> > > > > Cc: "Linux SCSI List" 
> > > > > Sent: Sunday, March 12, 2017 7:39:23 AM
> > > > > Subject: 4.10+ qla2xxx  driver wont load for qla2xxx 
(ISP2532-based
> > > > > 8Gb) with BAR 3 error, work fine on 4.9
> > > > >
> > > > > Chad, Himanshu
> > > > >
> > > > > Before I bisect or go chase changes, wanted to reach out because 
the
> > > > > driver seems to be the same version.
> > > > > Perhaps this is a PCIE change in the kernel for 4.10 affecting the
> > > > > load.
> > > > > Its the same targetLIO server I have been using for a long time 
with
> > > > > 4.9
> > > > >
> > > > > 27:00.0 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre 
Channel
> > > > > to
> > > > > PCI Express HBA (rev 02)
> > > > >
> > > > > With 4.9 I have no issues loading the driver for my targetLIO 
server.
> > > > > (DL380G8)
> > > > >
> > > > > # modinfo qla2xxx | more
> > > > > filename:
> > > > > /lib/modules/4.9.0.lobetcm+/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
> > > > > firmware:   ql2500_fw.bin
> > > > > version:8.07.00.38-k
> > > > > license:GPL
> > > > > description:QLogic Fibre Channel HBA Driver
> > > > > author: QLogic Corporation
> > > > > srcversion: 94A8431A85BFF854B97B02D
> > > > >
> > > > > [8.906351] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel
> > > > > HBA
> > > > > Driver: 8.07.00.38-k.
> > > > > [   10.014052] qla2xxx [:27:00.0]-001d: : Found an ISP2532 irq
> > > > > 106
> > > > > iobase
> > > > > 0xadce989a1000.
> > > > > [   10.455108] scsi host4: qla2xxx
> > > > > [   10.460206] qla2xxx [:27:00.0]-00fb:4: QLogic QLE2562 -
> > > > > PCI-Express
> > > > > Dual Channel 8Gb Fibre Channel HBA.
> > > > > [   10.460215] qla2xxx [:27:00.0]-00fc:4: ISP2532: PCIe 
(5.0GT/s
> > > > > x8)
> > > > > @
> > > > > :27:00.0 hdma+ host#=4 fw=8.03.00 (90d5).
> > > > > [   10.460545] qla2xxx [:27:00.1]-001d: : Found an ISP2532 irq
> > > > > 110
> > > > > iobase
> > > > > 0xadce989a9000.
> > > > > [   10.662120] scsi host5: qla2xxx
> > > > > [   11.007841] qla2xxx [:27:00.1]-00fb:5: QLogic QLE2562 -
> > > > > PCI-Express
> > > > > Dual Channel 8Gb Fibre Channel HBA.
> > > > > [   11.007849] qla2xxx [:27:00.1]-00fc:5: ISP2532: PCIe 
(5.0GT/s
> > > > > x8)
> > > > > @
> > > > > :27:00.1 hdma+ host#=5 fw=8.03.00 (90d5).
> > > > >
> > > > > Rebooting on the same server with 4.10 fails to load
> > > > >
> > > > > Linux  4.10.0+
> > > > > # modinfo qla2xxx | more
> > > > > filename:
> > > > > /lib/modules/4.10.0+/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
> > > > > 

[Bug 194837] VM with virtio-scsi drive often crashes during boot with kernel 4.11rc1

2017-03-14 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=194837

Thorsten Leemhuis (regressi...@leemhuis.info) changed:

   What|Removed |Added

 CC||regressi...@leemhuis.info

--- Comment #6 from Thorsten Leemhuis (regressi...@leemhuis.info) ---
FWIW, the culprit is somewhere between 10851-g20b83643abbc and
11620-ga3b4924b027f

-- 
You are receiving this mail because:
You are the assignee for the bug.


Re: [PATCHv3 5/6] scsi: make scsi_eh_scmd_add() always succeed

2017-03-14 Thread Benjamin Block
Hello Hannes,

On Wed, Mar 01, 2017 at 10:15:19AM +0100, Hannes Reinecke wrote:
> scsi_eh_scmd_add() currently only will fail if no
> error handler thread is started (which will never be the
> case) or if the state machine encounters an illegal transition.
> 
> But if we're encountering an invalid state transition
> chances is we cannot fixup things with the error handler.
> So better add a WARN_ON for illegal host states and
> make scsi_dh_scmd_add() a void function.
> 
> Signed-off-by: Hannes Reinecke 
> Reviewed-by: Johannes Thumshirn 
> ---
>  drivers/scsi/scsi_error.c | 41 +
>  drivers/scsi/scsi_lib.c   |  4 ++--
>  drivers/scsi/scsi_priv.h  |  2 +-
>  3 files changed, 16 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 4613aa1..802a081 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -163,13 +163,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
> *shost)
>   }
>   }
> 
> - if (!scsi_eh_scmd_add(scmd, 0)) {
> - SCSI_LOG_ERROR_RECOVERY(3,
> - scmd_printk(KERN_WARNING, scmd,
> - "terminate aborted command\n"));
> - set_host_byte(scmd, DID_TIME_OUT);
> - scsi_finish_command(scmd);
> - }
> + scsi_eh_scmd_add(scmd, 0);
>  }
> 
>  /**
> @@ -224,28 +218,23 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
> *shost)
>   * scsi_eh_scmd_add - add scsi cmd to error handling.
>   * @scmd:scmd to run eh on.
>   * @eh_flag: optional SCSI_EH flag.
> - *
> - * Return value:
> - *   0 on failure.
>   */
> -int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
> +void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>  {
>   struct Scsi_Host *shost = scmd->device->host;
>   unsigned long flags;
> - int ret = 0;
> + int ret;
> 
> - if (!shost->ehandler)
> - return 0;
> + WARN_ON_ONCE(!shost->ehandler);
>

So I saw that you already changed stuff here after Bart reviewed it. Do
you think it would be useful to make the warning per-host-instance,
rather than per-linux-instance?

Like, if for some reason the EH thread for one SCSI host dies - however
that might happen - we would get an individual warning in the klog for
this one host and could maybe even save the setup by
disabling/re-enabling the whole host - effectively removing the host and
adding a new one, plus a new EH thread for it.

With this WARN_ON_ONCE we end up in a broken setup w/o any information
what exactly broke. Previously we would get at least a SCSI-logging
message. Which would also help with analysis of such bugs - correlate
data etc.


Beste Grüße / Best regards,
  - Benjamin Block

> 
>   spin_lock_irqsave(shost->host_lock, flags);
> - if (scsi_host_set_state(shost, SHOST_RECOVERY))
> - if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY))
> - goto out_unlock;
> -
> + if (scsi_host_set_state(shost, SHOST_RECOVERY)) {
> + ret = scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY);
> + WARN_ON_ONCE(ret);
> + }
>   if (shost->eh_deadline != -1 && !shost->last_reset)
>   shost->last_reset = jiffies;
> 
> - ret = 1;
>   if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
>   eh_flag &= ~SCSI_EH_CANCEL_CMD;
>   scmd->eh_eflags |= eh_flag;
> @@ -253,9 +242,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>   list_add_tail(>eh_entry, >eh_cmd_q);
>   shost->host_failed++;
>   scsi_eh_wakeup(shost);
> - out_unlock:
>   spin_unlock_irqrestore(shost->host_lock, flags);
> - return ret;
>  }
> 
>  /**
> @@ -284,13 +271,11 @@ enum blk_eh_timer_return scsi_times_out(struct request 
> *req)
>   rtn = host->hostt->eh_timed_out(scmd);
> 
>   if (rtn == BLK_EH_NOT_HANDLED) {
> - if (!host->hostt->no_async_abort &&
> - scsi_abort_command(scmd) == SUCCESS)
> - return BLK_EH_NOT_HANDLED;
> -
> - set_host_byte(scmd, DID_TIME_OUT);
> - if (!scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD))
> - rtn = BLK_EH_HANDLED;
> + if (host->hostt->no_async_abort ||
> + scsi_abort_command(scmd) != SUCCESS) {
> + set_host_byte(scmd, DID_TIME_OUT);
> + scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD);
> + }
>   }
> 
>   return rtn;
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index f5e45a2..0735a46 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1593,8 +1593,8 @@ static void scsi_softirq_done(struct request *rq)
>   scsi_queue_insert(cmd, 

Re: [PATCHv3 4/6] scsi_error: do not escalate failed EH command

2017-03-14 Thread Benjamin Block
Hello Hannes,

On Wed, Mar 01, 2017 at 10:15:18AM +0100, Hannes Reinecke wrote:
> When a command is sent as part of the error handling there
> is not point whatsoever to start EH escalation when that
> command fails; we are _already_ in the error handler,
> and the escalation is about to commence anyway.
> So just call 'scsi_try_to_abort_cmd()' to abort outstanding
> commands and let the main EH routine handle the rest.
>
> Signed-off-by: Hannes Reinecke 
> Reviewed-by: Johannes Thumshirn 
> Reviewed-by: Bart Van Assche 
> ---
>  drivers/scsi/scsi_error.c | 11 +--
>  1 file changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index e1ca3b8..4613aa1 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -889,15 +889,6 @@ static int scsi_try_to_abort_cmd(struct 
> scsi_host_template *hostt,
>   return hostt->eh_abort_handler(scmd);
>  }
>
> -static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd)
> -{
> - if (scsi_try_to_abort_cmd(scmd->device->host->hostt, scmd) != SUCCESS)
> - if (scsi_try_bus_device_reset(scmd) != SUCCESS)
> - if (scsi_try_target_reset(scmd) != SUCCESS)
> - if (scsi_try_bus_reset(scmd) != SUCCESS)
> - scsi_try_host_reset(scmd);
> -}
> -
>  /**
>   * scsi_eh_prep_cmnd  - Save a scsi command info as part of error recovery
>   * @scmd:   SCSI command structure to hijack
> @@ -1082,7 +1073,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
> unsigned char *cmnd,
>   break;
>   }
>   } else if (rtn != FAILED) {
> - scsi_abort_eh_cmnd(scmd);
> + scsi_try_to_abort_cmd(shost->hostt, scmd);
>   rtn = FAILED;
>   }

The idea is sound, but this implementation would cause "use-after-free"s.

I only know our own LLD well enough to judge, but with zFCP there will
always be a chance that an abort fails - be it memory pressure,
hardware/firmware behavior or internal EH in zFCP.

Calling queuecommand() will mean for us in the LLD, that we allocate a
unique internal request struct for the scsi_cmnd (struct
zfcp_fsf_request) and add that to our internal hash-table with
outstanding commands. We assume this scsi_cmnd-pointer is ours till we
complete it via scsi_done are yield it via successful EH-actions.

In case the abort fails, you fail to take back the ownership over the
scsi command. Which in turn means possible "use-after-free"s when we
still thinks the scsi command is ours, but EH has already overwritten
the scsi-command with the original one. When we still get an answer or
otherwise use the scsi_cmnd-pointer we would access an invalid one.

I guess this might as well be true for other LLDs.


Beste Grüße / Best regards,
  - Benjamin Block

>
> --
> 1.8.5.6
>

--
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCHv3 3/6] scsi: make eh_eflags persistent

2017-03-14 Thread Benjamin Block
Hello Hannes,

sry that I only now reply to the other patches of the series.

On Wed, Mar 01, 2017 at 10:15:17AM +0100, Hannes Reinecke wrote:
> To detect if a failed command has been retried we must not
> clear scmd->eh_eflags when EH finishes.
> The flag should be persistent throughout the lifetime
> of the command.
>
> Reviewed-by: Johannes Thumshirn 
> Signed-off-by: Hannes Reinecke 
> ---
>  Documentation/scsi/scsi_eh.txt  | 14 +++---
>  drivers/scsi/libsas/sas_scsi_host.c |  2 --
>  drivers/scsi/scsi_error.c   |  4 ++--
>  include/scsi/scsi_eh.h  |  1 +
>  4 files changed, 10 insertions(+), 11 deletions(-)
>
> diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt
> index 37eca00..4edb9c1c 100644
> --- a/Documentation/scsi/scsi_eh.txt
> +++ b/Documentation/scsi/scsi_eh.txt
> @@ -105,11 +105,14 @@ function
>
>   2. If the host supports asynchronous completion (as indicated by the
>  no_async_abort setting in the host template) scsi_abort_command()
> -is invoked to schedule an asynchrous abort. If that fails
> -Step #3 is taken.
> +is invoked to schedule an asynchrous abort.
> +Asynchronous abort are not invoked for commands which the
> +SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command
> +already had been aborted once, and this is a retry which failed),
> +or when the EH deadline is expired. In these case Step #3 is taken.
>
> - 2. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the
> -command.  See [1-3] for more information.
> + 3. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the
> +command.  See [1-4] for more information.
>
>  [1-3] Asynchronous command aborts
>
> @@ -263,7 +266,6 @@ scmd->allowed.
>
>   3. scmd recovered
>  ACTION: scsi_eh_finish_cmd() is invoked to EH-finish scmd
> - - clear scmd->eh_eflags
>   - scsi_setup_cmd_retry()
>   - move from local eh_work_q to local eh_done_q
>  LOCKING: none
> @@ -456,8 +458,6 @@ except for #1 must be implemented by 
> eh_strategy_handler().
>
>   - shost->host_failed is zero.
>
> - - Each scmd's eh_eflags field is cleared.
> -
>   - Each scmd is in such a state that scsi_setup_cmd_retry() on the
> scmd doesn't make any difference.
>
> diff --git a/drivers/scsi/libsas/sas_scsi_host.c 
> b/drivers/scsi/libsas/sas_scsi_host.c
> index ee6b39a..87e5079 100644
> --- a/drivers/scsi/libsas/sas_scsi_host.c
> +++ b/drivers/scsi/libsas/sas_scsi_host.c
> @@ -613,8 +613,6 @@ static void sas_eh_handle_sas_errors(struct Scsi_Host 
> *shost, struct list_head *
>   SAS_DPRINTK("trying to find task 0x%p\n", task);
>   res = sas_scsi_find_task(task);
>
> - cmd->eh_eflags = 0;
> -
>   switch (res) {
>   case TASK_IS_DONE:
>   SAS_DPRINTK("%s: task 0x%p is done\n", __func__,
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index cec439c..e1ca3b8 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -189,7 +189,6 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
> *shost)
>   /*
>* Retry after abort failed, escalate to next level.
>*/
> - scmd->eh_eflags &= ~SCSI_EH_ABORT_SCHEDULED;
>   SCSI_LOG_ERROR_RECOVERY(3,
>   scmd_printk(KERN_INFO, scmd,
>   "previous abort failed\n"));
> @@ -933,6 +932,7 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct 
> scsi_eh_save *ses,
>   ses->result = scmd->result;
>   ses->underflow = scmd->underflow;
>   ses->prot_op = scmd->prot_op;
> + ses->eh_eflags = scmd->eh_eflags;
>
>   scmd->prot_op = SCSI_PROT_NORMAL;
>   scmd->eh_eflags = 0;
> @@ -996,6 +996,7 @@ void scsi_eh_restore_cmnd(struct scsi_cmnd* scmd, struct 
> scsi_eh_save *ses)
>   scmd->result = ses->result;
>   scmd->underflow = ses->underflow;
>   scmd->prot_op = ses->prot_op;
> + scmd->eh_eflags = ses->eh_eflags;
>  }
>  EXPORT_SYMBOL(scsi_eh_restore_cmnd);
>
> @@ -1140,7 +1141,6 @@ static int scsi_eh_reset(struct scsi_cmnd *scmd)
>   */
>  void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q)
>  {
> - scmd->eh_eflags = 0;
>   list_move_tail(>eh_entry, done_q);
>  }
>  EXPORT_SYMBOL(scsi_eh_finish_cmd);
> diff --git a/include/scsi/scsi_eh.h b/include/scsi/scsi_eh.h
> index 98d366b..a25b328 100644
> --- a/include/scsi/scsi_eh.h
> +++ b/include/scsi/scsi_eh.h
> @@ -31,6 +31,7 @@ extern int scsi_get_sense_info_fld(const u8 * sense_buffer, 
> int sb_len,
>  struct scsi_eh_save {
>   /* saved state */
>   int result;
> + int eh_eflags;
>   enum dma_data_direction data_direction;
>   unsigned underflow;
>   unsigned char cmd_len;
> --
> 1.8.5.6
>

So I think the code is good. The only thing I am missing is a bit of
reasoning why we want to 

Re: [PATCHv3 6/6] scsi: make asynchronous aborts mandatory

2017-03-14 Thread Benjamin Block
Hello Hannes,

On Wed, Mar 01, 2017 at 10:15:20AM +0100, Hannes Reinecke wrote:
> There hasn't been any reports for HBAs where asynchronous abort
> would not work, so we should make it mandatory and remove
> the fallback.
> 
> Signed-off-by: Hannes Reinecke 
> Reviewed-by: Johannes Thumshirn 
> Reviewed-by: Bart Van Assche 
> ---
>  Documentation/scsi/scsi_eh.txt | 18 --
>  drivers/scsi/scsi_error.c  | 81 
> --
>  drivers/scsi/scsi_lib.c|  2 +-
>  drivers/scsi/scsi_priv.h   |  3 +-
>  include/scsi/scsi_host.h   |  5 ---
>  5 files changed, 15 insertions(+), 94 deletions(-)
> 
> diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt
> index 4edb9c1c..11e447b 100644
> --- a/Documentation/scsi/scsi_eh.txt
> +++ b/Documentation/scsi/scsi_eh.txt
> @@ -70,7 +70,7 @@ with the command.
>   scmd is requeued to blk queue.
> 
>   - otherwise
> - scsi_eh_scmd_add(scmd, 0) is invoked for the command.  See
> + scsi_eh_scmd_add(scmd) is invoked for the command.  See
>   [1-3] for details of this function.
> 
> 
> @@ -103,9 +103,7 @@ function
>  eh_timed_out() callback did not handle the command.
>   Step #2 is taken.
> 
> - 2. If the host supports asynchronous completion (as indicated by the
> -no_async_abort setting in the host template) scsi_abort_command()
> -is invoked to schedule an asynchrous abort.
> + 2. scsi_abort_command() is invoked to schedule an asynchrous abort.
>  Asynchronous abort are not invoked for commands which the
>  SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command
>  already had been aborted once, and this is a retry which failed),
> @@ -127,16 +125,13 @@ function
> 
>   scmds enter EH via scsi_eh_scmd_add(), which does the following.
> 
> - 1. Turns on scmd->eh_eflags as requested.  It's 0 for error
> -completions and SCSI_EH_CANCEL_CMD for timeouts.
> + 1. Links scmd->eh_entry to shost->eh_cmd_q
> 
> - 2. Links scmd->eh_entry to shost->eh_cmd_q
> + 2. Sets SHOST_RECOVERY bit in shost->shost_state
> 
> - 3. Sets SHOST_RECOVERY bit in shost->shost_state
> + 3. Increments shost->host_failed
> 
> - 4. Increments shost->host_failed
> -
> - 5. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
> + 4. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
> 
>   As can be seen above, once any scmd is added to shost->eh_cmd_q,
>  SHOST_RECOVERY shost_state bit is turned on.  This prevents any new
> @@ -252,7 +247,6 @@ scmd->allowed.
> 
>   1. Error completion / time out
>  ACTION: scsi_eh_scmd_add() is invoked for scmd
> - - set scmd->eh_eflags
>   - add scmd to shost->eh_cmd_q
>   - set SHOST_RECOVERY
>   - shost->host_failed++
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 802a081..7b70ee9 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -163,7 +163,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
> *shost)
>   }
>   }
> 
> - scsi_eh_scmd_add(scmd, 0);
> + scsi_eh_scmd_add(scmd);
>  }
> 
>  /**
> @@ -217,9 +217,8 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
> *shost)
>  /**
>   * scsi_eh_scmd_add - add scsi cmd to error handling.
>   * @scmd:scmd to run eh on.
> - * @eh_flag: optional SCSI_EH flag.
>   */
> -void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
> +void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
>  {
>   struct Scsi_Host *shost = scmd->device->host;
>   unsigned long flags;
> @@ -235,9 +234,6 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>   if (shost->eh_deadline != -1 && !shost->last_reset)
>   shost->last_reset = jiffies;
> 
> - if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
> - eh_flag &= ~SCSI_EH_CANCEL_CMD;
> - scmd->eh_eflags |= eh_flag;
>   scsi_eh_reset(scmd);
>   list_add_tail(>eh_entry, >eh_cmd_q);
>   shost->host_failed++;
> @@ -271,10 +267,9 @@ enum blk_eh_timer_return scsi_times_out(struct request 
> *req)
>   rtn = host->hostt->eh_timed_out(scmd);
> 
>   if (rtn == BLK_EH_NOT_HANDLED) {
> - if (host->hostt->no_async_abort ||
> - scsi_abort_command(scmd) != SUCCESS) {
> + if (scsi_abort_command(scmd) != SUCCESS) {
>   set_host_byte(scmd, DID_TIME_OUT);
> - scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD);
> + scsi_eh_scmd_add(scmd);
>   }
>   }
> 
> @@ -327,7 +322,7 @@ static inline void scsi_eh_prt_fail_stats(struct 
> Scsi_Host *shost,
>   list_for_each_entry(scmd, work_q, eh_entry) {
>   if (scmd->device == sdev) {
>   ++total_failures;
> - if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD)
> +

[PATCH V2] qla2xxx: Fix crash in qla2xxx_eh_abort on bad ptr

2017-03-14 Thread Bill Kuzeja
After a Qlogic card breaks when initializing (test case), the system can
crash in qla2xxx_eh_abort if processing anything but a scsi command type 
srb.
---

Changes from previous version:
---
 - V2 - add concise desciption for git log

After the break, qla2x00_abort_all_cmds gets invoked. This routine has a
relatively new section introduced by these commits:

commit 1535aa75a3d8 ("scsi: qla2xxx: fix invalid DMA access after command 
aborts in PCI device remove")
commit c733ab351243 ("scsi: qla2xxx: do not abort all commands in the adapter 
during EEH recovery")
commit 2780f3c8f0233 ("scsi: qla2xxx: Avoid that issuing a LIP triggers a 
kernel crash") 

The section is problematic in certain cases. Here's the if statement in 
question:

if (GET_CMD_SP(sp) && !ha->flags.eeh_busy) {
   /* Get a reference to the sp and drop the lock.
* The reference ensures this sp->done() call
* - and not the call in qla2xxx_eh_abort() -
* ends the SCSI command (with result 'res').
*/
sp_get(sp);
spin_unlock_irqrestore(>hardware_lock,flags);
qla2xxx_eh_abort(GET_CMD_SP(sp));
spin_lock_irqsave(>hardware_lock, flags);
}

Now here's the panic my test provokes:

[  927.823858] Call Trace:
[  927.581661]  qla2xxx_eh_abort+0x19/0x2b0 [qla2xxx]
[  927.829269]  [] qla2x00_abort_all_cmds+0xf6/0x14 [qla2xxx]
[  927.845014]  [] 
qla2x00_disable_board_on_pci_error+0x8f/0x160 [qla2xxx]
[  927.863054]  [] process_one_work+0x17b/0x470
[  927.875916]  [] worker_thread+0x126/0x410
[  927.888203]  [] ? rescuer_thread+0x460/0x460
[  927.901067]  [] kthread+0xcf/0xe0
[  927.911823]  [] ?kthread_create_on_node+0x140/0x140
[  927.926224]  [] ret_from_fork+0x58/0x90
[  927.938132]  [] ?kthread_create_on_node+0x140/0x140

We crash in qla2xxx_eh_abort trying to get the vha:

scsi_qla_host_t *vha = shost_priv(cmd->device->host);

Closer examination of the crash shows that the value of "cmd" is 2,
certainly not a valid pointer.

What's happening here?

The check at the top of the if-block GET_CMD_SP(sp) implicitly != NULL 
would have prevented this sort of thing. However, since sp->u.scmd.cmd 
is not *quite* null (in my crashes it's usually 2), we fall into the 
if-block and call qla2xxx_eh_abort - and crash trying to get cmd.

Note that the GET_CMD_SP(sp) is doing this:

#define GET_CMD_SP(sp) (sp->u.scmd.cmd)

and is acting upon a union:

union {
struct srb_iocb iocb_cmd;
struct fc_bsg_job *bsg_job;
struct srb_cmd scmd;
}

The address it's getting is the first element in this structure:

struct srb_cmd {
struct scsi_cmnd *cmd;  /* Linux SCSI command pkt */

}

However, since this is a union, the same memory can be used this way 
instead:

struct srb_iocb {
union {
struct {
uint16_t flags;
uint16_t data[2];
u32 iop[2];
} logio;


Turns out, in the kernel panics I have gotten, the iocb type is logio
(verified by srb->type = SRB_LOGIN_CMD). 

To further check, I looked at the logio iocb in the crash:

logio = {
  flags = 0x2,
  data = {0x0, 0x0}


which follows since:

   lio->u.logio.flags |= SRB_LOGIN_COND_PLOGI;

and

#define SRB_LOGIN_COND_PLOGIBIT_1

In order to eliminate this crash, this patch adds an extra check to the
top of the if statement to check whether or not sp->type == SRB_SCSI_CMD. 
If not, then don't bother doing the rest of the if-block. It doesn't look
like we should be invoking qla2xxx_eh_abort for anything other than
srb_cmds anyways.

I thought about changing the definition of GET_CMD_SP to include this
type check and return NULL if sp is not type SRB_SCSI_CMD - like this:

#define GET_CMD_SP(sp) ((sp->type == SRB_SCSI_CMD) ? sp->u.scmd.cmd : NULL)

I decided against it as there are multiple places in the code that do not
check for NULL. If you're calling GET_CMD_SP you should be dealing with 
an SRB_SCSI_CMDbut we aren't in this case. So, for this patch I went 
with the more contained and safer change.

This problem is hard to hit, but I have run into it doing negative
testing many times now (with a test specifically designed to bring it
out), so I can verify that this fix works. My testing has been against
a RHEL7 driver variant, but the bug and patch are equally relevant to
to the upstream driver.

Fixes: 1535aa75a3d8 ("scsi: qla2xxx: fix invalid DMA access after command 
aborts in PCI device remove")
Signed-off-by: Bill Kuzeja 
---
 drivers/scsi/qla2xxx/qla_os.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index d01c90c..4eec095 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -1617,7 +1617,8 @@ uint32_t 

Re: [PATCH] scsi: storvsc: Add support for FC rport.

2017-03-14 Thread Cathy Avery

Good catch. Thanks!

On 03/14/2017 12:42 PM, Stephen Hemminger wrote:

On Tue, 14 Mar 2017 12:01:03 -0400
Cathy Avery  wrote:


  #if IS_ENABLED(CONFIG_SCSI_FC_ATTRS)
if (host->transportt == fc_transport_template) {
+   struct fc_rport_identifiers ids;
+
+   ids.node_name = 0;
+   ids.port_name = 0;
+   ids.port_id = 0;
+   ids.roles |= FC_PORT_ROLE_FCP_TARGET;

Since the variable ids is on the stack, it is uninitialized data.
Doing a OR with uninitialized data is not correct.

Better off to use C99 style iniatializer and skip the zero fields.

struct fc_rport_identifiers ids = {
.roles = FC_PORT_ROLE_FCP_TARGET,
};




Re: [PATCH] scsi: storvsc: Add support for FC rport.

2017-03-14 Thread Stephen Hemminger
On Tue, 14 Mar 2017 12:01:03 -0400
Cathy Avery  wrote:

>  #if IS_ENABLED(CONFIG_SCSI_FC_ATTRS)
>   if (host->transportt == fc_transport_template) {
> + struct fc_rport_identifiers ids;
> +
> + ids.node_name = 0;
> + ids.port_name = 0;
> + ids.port_id = 0;
> + ids.roles |= FC_PORT_ROLE_FCP_TARGET;

Since the variable ids is on the stack, it is uninitialized data.
Doing a OR with uninitialized data is not correct.

Better off to use C99 style iniatializer and skip the zero fields.

struct fc_rport_identifiers ids = {
.roles = FC_PORT_ROLE_FCP_TARGET,
};


[PATCH] scsi: storvsc: Add support for FC rport.

2017-03-14 Thread Cathy Avery
Included in the current storvsc driver for Hyper-V is the ability
to access luns on an FC fabric via a virtualized fiber channel
adapter exposed by the Hyper-V host. The driver also attaches to the FC
transport to allow host and port names to be published under
/sys/class/fc_host/hostX. Current customer tools running on the VM
require that these names be available in the well known standard
location under fc_host/hostX.

A problem arose when attaching to the FC transport. The scsi_scan
code attempts to call fc_user_scan which has basically become a no-op
due to the fact that the virtualized FC device does not expose rports.
At this point you cannot refresh the scsi bus after mapping or unmapping
luns on the SAN without a reboot of the VM.

This patch stubs in an rport per fc_host in storvsc so that the
requirement of a defined rport is now met within the fc_transport and
echo "- - -" > /sys/class/scsi_host/hostX/scan now works.

Signed-off-by: Cathy Avery 
---
 drivers/scsi/storvsc_drv.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 638e5f4..c6c0316 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -478,6 +478,9 @@ struct storvsc_device {
 */
u64 node_name;
u64 port_name;
+#if IS_ENABLED(CONFIG_SCSI_FC_ATTRS)
+   struct fc_rport *rport;
+#endif
 };
 
 struct hv_host_device {
@@ -1823,8 +1826,16 @@ static int storvsc_probe(struct hv_device *device,
}
 #if IS_ENABLED(CONFIG_SCSI_FC_ATTRS)
if (host->transportt == fc_transport_template) {
+   struct fc_rport_identifiers ids;
+
+   ids.node_name = 0;
+   ids.port_name = 0;
+   ids.port_id = 0;
+   ids.roles |= FC_PORT_ROLE_FCP_TARGET;
+
fc_host_node_name(host) = stor_device->node_name;
fc_host_port_name(host) = stor_device->port_name;
+   stor_device->rport = fc_remote_port_add(host, 0, );
}
 #endif
return 0;
@@ -1854,8 +1865,10 @@ static int storvsc_remove(struct hv_device *dev)
struct Scsi_Host *host = stor_device->host;
 
 #if IS_ENABLED(CONFIG_SCSI_FC_ATTRS)
-   if (host->transportt == fc_transport_template)
+   if (host->transportt == fc_transport_template) {
+   fc_remote_port_delete(stor_device->rport);
fc_remove_host(host);
+   }
 #endif
scsi_remove_host(host);
storvsc_dev_remove(dev);
-- 
2.5.0



Re: [PATCH 08/29] drivers, md: convert mddev.active from atomic_t to refcount_t

2017-03-14 Thread James Bottomley
On Tue, 2017-03-14 at 12:29 +, Reshetova, Elena wrote:
> > Elena Reshetova  writes:
> > 
> > > refcount_t type and corresponding API should be
> > > used instead of atomic_t when the variable is used as
> > > a reference counter. This allows to avoid accidental
> > > refcounter overflows that might lead to use-after-free
> > > situations.
> > > 
> > > Signed-off-by: Elena Reshetova 
> > > Signed-off-by: Hans Liljestrand 
> > > Signed-off-by: Kees Cook 
> > > Signed-off-by: David Windsor 
> > > ---
> > >  drivers/md/md.c | 6 +++---
> > >  drivers/md/md.h | 3 ++-
> > >  2 files changed, 5 insertions(+), 4 deletions(-)
> > 
> > When booting linux-next (specifically 5be4921c9958ec) I'm seeing
> > the
> > backtrace below. I suspect this patch is just exposing an existing
> > issue?
> 
> Yes, we have actually been following this issue in the another
> thread. 
> It looks like the object is re-used somehow, but I can't quite
> understand how just by reading the code. 
> This was what I put into the previous thread:
> 
> "The log below indicates that you are using your refcounter in a bit
> weird way in mddev_find(). 
> However, I can't find the place (just by reading the code) where you
> would increment refcounter from zero (vs. setting it to one).
> It looks like you either iterate over existing nodes (and increment
> their counters, which should be >= 1 at the time of increment) or
> create a new node, but then mddev_init() sets the counter to 1. "
> 
> If you can help to understand what is going on with the object
> creation/destruction, would be appreciated!
> 
> Also Shaohua Li stopped this patch coming from his tree since the 
> issue was caught at that time, so we are not going to merge this 
> until we figure it out. 

Asking on the correct list (dm-devel) would have got you the easy
answer:  The refcount behind mddev->active is a genuine atomic.  It has
refcount properties but only if the array fails to initialise (in that
case, final put kills it).  Once it's added to the system as a gendisk,
it cannot be freed until md_free().  Thus its ->active count can go to
zero (when it becomes inactive; usually because of an unmount). On a
simple allocation regardless of outcome, the last executed statement in
md_alloc is mddev_put(): that destroys the device if we didn't manage
to create it or returns 0 and adds an inactive device to the system
which the user can get with mddev_find().

James




Re: [PATCH v2] scsi_sysfs: fix hang when removing scsi device

2017-03-14 Thread Israel Rukshin

Hello Bart,

Patch number 5 doesn't handle the case when device_for_each_child() is 
called.
device_for_each_child() calls to target_unblock() that uses also 
starget_for_each_device().
After applying also the following change the hang disappeared but it 
didn't fix the warning.
Those fixes are not enough because if fast_io_fail_tmo is set to 
infinity then the hang will remain,
because only __rport_fail_io_fast() calls to scsi_target_unblock() and 
terminate_rport_io()

that free the sync cache command.

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index e5d4b50..09f9566 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -3068,9 +3068,15 @@ void scsi_device_resume(struct scsi_device *sdev)
 static int
 target_unblock(struct device *dev, void *data)
 {
-   if (scsi_is_target_device(dev))
-   starget_for_each_device(to_scsi_target(dev), data,
-   device_unblock);
+   if (scsi_is_target_device(dev)) {
+   struct scsi_target *starget = to_scsi_target(dev);
+   struct Scsi_Host *shost = dev_to_shost(dev->parent);
+   unsigned long flags;
+
+   spin_lock_irqsave(shost->host_lock, flags);
+   __starget_for_each_device(starget, data, device_unblock);
+   spin_unlock_irqrestore(shost->host_lock, flags);
+   }
return 0;
 }

--
1.8.4.3


Israel


On 3/14/2017 11:44 AM, Israel Rukshin wrote:

Hello Bart,

I applied and tested patches 3, 4 and 5.
I am sorry to tell you that it didn't work.
I saw the warning you added and the hang at device_del().

[  333.696462] [ cut here ]
[  333.696471] WARNING: CPU: 11 PID: 321 at 
drivers/scsi/scsi_sysfs.c:1280 __scsi_remove_device+0x106/0x110
[  333.696472] Modules linked in: nfsv3 ib_srp(-) dm_service_time 
scsi_transport_srp ib_uverbs ib_umad ib_ipoib ib_cm mlx4_ib ib_core 
rpcsec_gss_krb5 nfsv4 dns_resolver nfs netconsole fscache dm_mirror 
dm_region_hash dm_log sb_edac edac_core x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel iTCO_wdt pcbc iTCO_vendor_support 
aesni_intel mei_me crypto_simd joydev input_leds ipmi_si ioatdma mei 
glue_helper lpc_ich pcspkr cryptd ipmi_devintf shpchp ipmi_msghandler 
sg i2c_i801 mfd_core dm_multipath dm_mod nfsd binfmt_misc auth_rpcgss 
nfs_acl lockd grace sunrpc ip_tables ext4 jbd2 mbcache sd_mod mgag200 
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops isci igb 
ttm ahci libsas ptp libahci scsi_transport_sas pps_core dca crc32c_intel
[  333.696511]  drm i2c_algo_bit libata mlx4_core fjes [last unloaded: 
ib_srp]
[  333.696516] CPU: 11 PID: 321 Comm: kworker/11:1 Not tainted 
4.11.0-rc2+ #99
[  333.696517] Hardware name: Supermicro X9DRFR/X9DRFR, BIOS 1.0a 
09/11/2012

[  333.696522] Workqueue: srp_remove srp_remove_work [ib_srp]
[  333.696523] Call Trace:
[  333.696530]  dump_stack+0x63/0x90
[  333.696534]  __warn+0xcb/0xf0
[  333.696536]  warn_slowpath_null+0x1d/0x20
[  333.696538]  __scsi_remove_device+0x106/0x110
[  333.696540]  scsi_forget_host+0x60/0x70
[  333.696545]  scsi_remove_host+0x77/0x110
[  333.696547]  srp_remove_work+0x90/0x230 [ib_srp]
[  333.696551]  process_one_work+0x177/0x430
[  333.696553]  worker_thread+0x4e/0x4b0
[  333.696555]  kthread+0x101/0x140
[  333.696556]  ? process_one_work+0x430/0x430
[  333.696558]  ? kthread_create_on_node+0x60/0x60
[  333.696563]  ret_from_fork+0x2c/0x40
[  333.696565] ---[ end trace b1edd584bf21aaba ]---

Israel.


On 3/14/2017 4:35 AM, Bart Van Assche wrote:

On Mon, 2017-03-13 at 14:55 -0700, James Bottomley wrote:

This is true, but I don't see how it can cause the host to be freed
before the sdev.  The memory for struct Scsi_Host is freed in the
shost_gendev release routine, which should be pinned by the parent
traversal from sdev.  So it should not be possible for
  scsi_host_dev_release() to be called before
  scsi_device_dev_release_usercontext() becase the latter has the final
put of the parent device.

Hello James,

I will run a bisect to see whether that provides more information about
what caused the change in the reference counting behavior.

Israel, since you did not hit the reference counting issue in your 
tests,

can you repeat your test with patches 3, 4 and 5 applied?

Thanks,

Bart.






RE: [PATCH 08/29] drivers, md: convert mddev.active from atomic_t to refcount_t

2017-03-14 Thread Reshetova, Elena
> Elena Reshetova  writes:
> 
> > refcount_t type and corresponding API should be
> > used instead of atomic_t when the variable is used as
> > a reference counter. This allows to avoid accidental
> > refcounter overflows that might lead to use-after-free
> > situations.
> >
> > Signed-off-by: Elena Reshetova 
> > Signed-off-by: Hans Liljestrand 
> > Signed-off-by: Kees Cook 
> > Signed-off-by: David Windsor 
> > ---
> >  drivers/md/md.c | 6 +++---
> >  drivers/md/md.h | 3 ++-
> >  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> When booting linux-next (specifically 5be4921c9958ec) I'm seeing the
> backtrace below. I suspect this patch is just exposing an existing
> issue?

Yes, we have actually been following this issue in the another thread. 
It looks like the object is re-used somehow, but I can't quite understand how 
just by reading the code. 
This was what I put into the previous thread:

"The log below indicates that you are using your refcounter in a bit weird way 
in mddev_find(). 
However, I can't find the place (just by reading the code) where you would 
increment refcounter from zero (vs. setting it to one).
It looks like you either iterate over existing nodes (and increment their 
counters, which should be >= 1 at the time of increment) or create a new node, 
but then mddev_init() sets the counter to 1. "

If you can help to understand what is going on with the object 
creation/destruction, would be appreciated!

Also Shaohua Li stopped this patch coming from his tree since the issue was 
caught at that time, so we are not going to merge this until we figure it out. 

Best Regards,
Elena.

> 
> cheers
> 
> 
> [0.230738] md: Waiting for all devices to be available before autodetect
> [0.230742] md: If you don't use raid, use raid=noautodetect
> [0.230962] refcount_t: increment on 0; use-after-free.
> [0.230988] [ cut here ]
> [0.230996] WARNING: CPU: 0 PID: 1 at lib/refcount.c:114
> .refcount_inc+0x5c/0x70
> [0.231001] Modules linked in:
> [0.231006] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.11.0-rc1-gccN-next-
> 20170310-g5be4921 #1
> [0.231012] task: c0004940 task.stack: c0004944
> [0.231016] NIP: c05ac6bc LR: c05ac6b8 CTR:
> c0743390
> [0.231021] REGS: c00049443160 TRAP: 0700   Not tainted  (4.11.0-rc1-
> gccN-next-20170310-g5be4921)
> [0.231026] MSR: 80029032 
> [0.231033]   CR: 24024422  XER: 000c
> [0.231038] CFAR: c0a5356c SOFTE: 1
> [0.231038] GPR00: c05ac6b8 c000494433e0 c1079d00
> 002b
> [0.231038] GPR04:  00ef 
> c10418a0
> [0.231038] GPR08: 4af8 c0ecc9a8 c0ecc9a8
> 
> [0.231038] GPR12: 28024824 c6bb 
> c00049443a00
> [0.231038] GPR16:  c00049443a10 
> 
> [0.231038] GPR20:   c0f7dd20
> 
> [0.231038] GPR24: 014080c0 c12060b8 c1206080
> 0009
> [0.231038] GPR28: c0f7dde0 0090 
> c000461ae800
> [0.231100] NIP [c05ac6bc] .refcount_inc+0x5c/0x70
> [0.231104] LR [c05ac6b8] .refcount_inc+0x58/0x70
> [0.231108] Call Trace:
> [0.231112] [c000494433e0] [c05ac6b8] .refcount_inc+0x58/0x70
> (unreliable)
> [0.231120] [c00049443450] [c086c008]
> .mddev_find+0x1e8/0x430
> [0.231125] [c00049443530] [c0872b6c] .md_open+0x2c/0x140
> [0.231132] [c000494435c0] [c03962a4]
> .__blkdev_get+0xd4/0x520
> [0.231138] [c00049443690] [c0396cc0] .blkdev_get+0x1c0/0x4f0
> [0.231145] [c00049443790] [c0336d64]
> .do_dentry_open.isra.1+0x2a4/0x410
> [0.231152] [c00049443830] [c03523f4]
> .path_openat+0x624/0x1580
> [0.231157] [c00049443990] [c0354ce4]
> .do_filp_open+0x84/0x120
> [0.231163] [c00049443b10] [c0338d74]
> .do_sys_open+0x214/0x300
> [0.231170] [c00049443be0] [c0da69ac]
> .md_run_setup+0xa0/0xec
> [0.231176] [c00049443c60] [c0da4fbc]
> .prepare_namespace+0x60/0x240
> [0.231182] [c00049443ce0] [c0da47a8]
> .kernel_init_freeable+0x330/0x36c
> [0.231190] [c00049443db0] [c000dc44] .kernel_init+0x24/0x160
> [0.231197] [c00049443e30] [c000badc]
> .ret_from_kernel_thread+0x58/0x7c
> [0.231202] Instruction dump:
> [0.231206] 6000 3d22ffee 89296bfb 2f89 409effdc 3c62ffc6 3921
> 3d42ffee
> [0.231216] 38630928 992a6bfb 484a6e79 6000 <0fe0> 4bb8
> 6000 6000
> [

[scsi] scsi: ufshcd-platform: remove the useless cast in ERR_PTR/IS_ERR

2017-03-14 Thread Tomas Winkler
IS_ERR and ERR_PTR already forcefully cast their argument,
hence there is no need for additional (complex) casting.

Signed-off-by: Tomas Winkler 
---
 drivers/scsi/ufs/ufshcd-pltfrm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd-pltfrm.c b/drivers/scsi/ufs/ufshcd-pltfrm.c
index a72a4ba78125..8e5e6c04c035 100644
--- a/drivers/scsi/ufs/ufshcd-pltfrm.c
+++ b/drivers/scsi/ufs/ufshcd-pltfrm.c
@@ -309,8 +309,8 @@ int ufshcd_pltfrm_init(struct platform_device *pdev,
 
mem_res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
mmio_base = devm_ioremap_resource(dev, mem_res);
-   if (IS_ERR(*(void **)_base)) {
-   err = PTR_ERR(*(void **)_base);
+   if (IS_ERR(mmio_base)) {
+   err = PTR_ERR(mmio_base);
goto out;
}
 
-- 
2.9.3



Re: [PATCH 08/29] drivers, md: convert mddev.active from atomic_t to refcount_t

2017-03-14 Thread Michael Ellerman
Elena Reshetova  writes:

> refcount_t type and corresponding API should be
> used instead of atomic_t when the variable is used as
> a reference counter. This allows to avoid accidental
> refcounter overflows that might lead to use-after-free
> situations.
>
> Signed-off-by: Elena Reshetova 
> Signed-off-by: Hans Liljestrand 
> Signed-off-by: Kees Cook 
> Signed-off-by: David Windsor 
> ---
>  drivers/md/md.c | 6 +++---
>  drivers/md/md.h | 3 ++-
>  2 files changed, 5 insertions(+), 4 deletions(-)

When booting linux-next (specifically 5be4921c9958ec) I'm seeing the
backtrace below. I suspect this patch is just exposing an existing
issue?

cheers


[0.230738] md: Waiting for all devices to be available before autodetect
[0.230742] md: If you don't use raid, use raid=noautodetect
[0.230962] refcount_t: increment on 0; use-after-free.
[0.230988] [ cut here ]
[0.230996] WARNING: CPU: 0 PID: 1 at lib/refcount.c:114 
.refcount_inc+0x5c/0x70
[0.231001] Modules linked in:
[0.231006] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.11.0-rc1-gccN-next-20170310-g5be4921 #1
[0.231012] task: c0004940 task.stack: c0004944
[0.231016] NIP: c05ac6bc LR: c05ac6b8 CTR: c0743390
[0.231021] REGS: c00049443160 TRAP: 0700   Not tainted  
(4.11.0-rc1-gccN-next-20170310-g5be4921)
[0.231026] MSR: 80029032 
[0.231033]   CR: 24024422  XER: 000c
[0.231038] CFAR: c0a5356c SOFTE: 1 
[0.231038] GPR00: c05ac6b8 c000494433e0 c1079d00 
002b 
[0.231038] GPR04:  00ef  
c10418a0 
[0.231038] GPR08: 4af8 c0ecc9a8 c0ecc9a8 
 
[0.231038] GPR12: 28024824 c6bb  
c00049443a00 
[0.231038] GPR16:  c00049443a10  
 
[0.231038] GPR20:   c0f7dd20 
 
[0.231038] GPR24: 014080c0 c12060b8 c1206080 
0009 
[0.231038] GPR28: c0f7dde0 0090  
c000461ae800 
[0.231100] NIP [c05ac6bc] .refcount_inc+0x5c/0x70
[0.231104] LR [c05ac6b8] .refcount_inc+0x58/0x70
[0.231108] Call Trace:
[0.231112] [c000494433e0] [c05ac6b8] .refcount_inc+0x58/0x70 
(unreliable)
[0.231120] [c00049443450] [c086c008] .mddev_find+0x1e8/0x430
[0.231125] [c00049443530] [c0872b6c] .md_open+0x2c/0x140
[0.231132] [c000494435c0] [c03962a4] .__blkdev_get+0xd4/0x520
[0.231138] [c00049443690] [c0396cc0] .blkdev_get+0x1c0/0x4f0
[0.231145] [c00049443790] [c0336d64] 
.do_dentry_open.isra.1+0x2a4/0x410
[0.231152] [c00049443830] [c03523f4] .path_openat+0x624/0x1580
[0.231157] [c00049443990] [c0354ce4] .do_filp_open+0x84/0x120
[0.231163] [c00049443b10] [c0338d74] .do_sys_open+0x214/0x300
[0.231170] [c00049443be0] [c0da69ac] .md_run_setup+0xa0/0xec
[0.231176] [c00049443c60] [c0da4fbc] 
.prepare_namespace+0x60/0x240
[0.231182] [c00049443ce0] [c0da47a8] 
.kernel_init_freeable+0x330/0x36c
[0.231190] [c00049443db0] [c000dc44] .kernel_init+0x24/0x160
[0.231197] [c00049443e30] [c000badc] 
.ret_from_kernel_thread+0x58/0x7c
[0.231202] Instruction dump:
[0.231206] 6000 3d22ffee 89296bfb 2f89 409effdc 3c62ffc6 3921 
3d42ffee 
[0.231216] 38630928 992a6bfb 484a6e79 6000 <0fe0> 4bb8 6000 
6000 
[0.231226] ---[ end trace 8c51f269ad91ffc2 ]---
[0.231233] md: Autodetecting RAID arrays.
[0.231236] md: autorun ...
[0.231239] md: ... autorun DONE.
[0.234188] EXT4-fs (sda4): mounting ext3 file system using the ext4 
subsystem
[0.250506] refcount_t: underflow; use-after-free.
[0.250531] [ cut here ]
[0.250537] WARNING: CPU: 0 PID: 3 at lib/refcount.c:207 
.refcount_dec_not_one+0x104/0x120
[0.250542] Modules linked in:
[0.250546] CPU: 0 PID: 3 Comm: kworker/0:0 Tainted: GW   
4.11.0-rc1-gccN-next-20170310-g5be4921 #1
[0.250553] Workqueue: events .delayed_fput
[0.250557] task: c00049404900 task.stack: c00049448000
[0.250562] NIP: c05ac964 LR: c05ac960 CTR: c0743390
[0.250567] REGS: c0004944b530 TRAP: 0700   Tainted: GW
(4.11.0-rc1-gccN-next-20170310-g5be4921)
[0.250572] MSR: 80029032 
[0.250578]   CR: 24002422  XER: 0007
[0.250584] CFAR: c0a5356c SOFTE: 1 
[0.250584] GPR00: c05ac960 

Re: [PATCH][V2] scsi: cxgb3i: remove redundant null check and kfree on skb

2017-03-14 Thread Johannes Thumshirn
On Tue, Mar 14, 2017 at 10:48:43AM +, Colin King wrote:
> From: Colin Ian King 
> 
> On the error exit path, skb is always null, so the non-null check
> and __kfree_skb call are redundant.  Remove the redundant code and
> just directly return with the appropriate error return code.
> 
> Detected by CoverityScan, CID#114328 ("Logically Dead Code")
> 
> Signed-off-by: Colin Ian King 
> ---
>  drivers/scsi/cxgbi/cxgb3i/cxgb3i.c | 9 ++---
>  1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c 
> b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
> index 1880eb6..3c9f8cf2 100644
> --- a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
> +++ b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
> @@ -979,14 +979,14 @@ static int init_act_open(struct cxgbi_sock *csk)
>   csk->atid = cxgb3_alloc_atid(t3dev, _client, csk);
>   if (csk->atid < 0) {
>   pr_err("NO atid available.\n");
> - goto rel_resource;
> + return -EINVAL;
>   }
>   cxgbi_sock_set_flag(csk, CTPF_HAS_ATID);
>   cxgbi_sock_get(csk);
>  
>   skb = alloc_wr(sizeof(struct cpl_act_open_req), 0, GFP_KERNEL);
>   if (!skb)
> - goto rel_resource;
> + return -ENOMEM;

I don't think that's correct, not that it was before. cxgbi_sock_get(csk) does a
kref_get(>refcnt), so this will at lease leak a kref. It will also "leak"
the atids_in_use in cxgb3_alloc_atid() as there's a call to cxgb3_free_atid()
missing. Looks like the complete cleanup path is worng here.

But I'd prefer having Karen or someone else at Chelsio confirm my assumptions.

Thanks,
Johannes

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH][V2] scsi: cxgb3i: remove redundant null check and kfree on skb

2017-03-14 Thread Dan Carpenter
Thanks!

regards,
dan carpenter



[PATCH][V2] scsi: cxgb3i: remove redundant null check and kfree on skb

2017-03-14 Thread Colin King
From: Colin Ian King 

On the error exit path, skb is always null, so the non-null check
and __kfree_skb call are redundant.  Remove the redundant code and
just directly return with the appropriate error return code.

Detected by CoverityScan, CID#114328 ("Logically Dead Code")

Signed-off-by: Colin Ian King 
---
 drivers/scsi/cxgbi/cxgb3i/cxgb3i.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c 
b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
index 1880eb6..3c9f8cf2 100644
--- a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
+++ b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
@@ -979,14 +979,14 @@ static int init_act_open(struct cxgbi_sock *csk)
csk->atid = cxgb3_alloc_atid(t3dev, _client, csk);
if (csk->atid < 0) {
pr_err("NO atid available.\n");
-   goto rel_resource;
+   return -EINVAL;
}
cxgbi_sock_set_flag(csk, CTPF_HAS_ATID);
cxgbi_sock_get(csk);
 
skb = alloc_wr(sizeof(struct cpl_act_open_req), 0, GFP_KERNEL);
if (!skb)
-   goto rel_resource;
+   return -ENOMEM;
skb->sk = (struct sock *)csk;
set_arp_failure_handler(skb, act_open_arp_failure);
csk->snd_win = cxgb3i_snd_win;
@@ -1007,11 +1007,6 @@ static int init_act_open(struct cxgbi_sock *csk)
cxgbi_sock_set_state(csk, CTP_ACTIVE_OPEN);
send_act_open_req(csk, skb, csk->l2t);
return 0;
-
-rel_resource:
-   if (skb)
-   __kfree_skb(skb);
-   return -EINVAL;
 }
 
 cxgb3_cpl_handler_func cxgb3i_cpl_handlers[NUM_CPL_CMDS] = {
-- 
2.10.2



Re: [PATCH v2] scsi_sysfs: fix hang when removing scsi device

2017-03-14 Thread Israel Rukshin

Hello Bart,

I applied and tested patches 3, 4 and 5.
I am sorry to tell you that it didn't work.
I saw the warning you added and the hang at device_del().

[  333.696462] [ cut here ]
[  333.696471] WARNING: CPU: 11 PID: 321 at 
drivers/scsi/scsi_sysfs.c:1280 __scsi_remove_device+0x106/0x110
[  333.696472] Modules linked in: nfsv3 ib_srp(-) dm_service_time 
scsi_transport_srp ib_uverbs ib_umad ib_ipoib ib_cm mlx4_ib ib_core 
rpcsec_gss_krb5 nfsv4 dns_resolver nfs netconsole fscache dm_mirror 
dm_region_hash dm_log sb_edac edac_core x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel iTCO_wdt pcbc iTCO_vendor_support 
aesni_intel mei_me crypto_simd joydev input_leds ipmi_si ioatdma mei 
glue_helper lpc_ich pcspkr cryptd ipmi_devintf shpchp ipmi_msghandler sg 
i2c_i801 mfd_core dm_multipath dm_mod nfsd binfmt_misc auth_rpcgss 
nfs_acl lockd grace sunrpc ip_tables ext4 jbd2 mbcache sd_mod mgag200 
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops isci igb 
ttm ahci libsas ptp libahci scsi_transport_sas pps_core dca crc32c_intel
[  333.696511]  drm i2c_algo_bit libata mlx4_core fjes [last unloaded: 
ib_srp]
[  333.696516] CPU: 11 PID: 321 Comm: kworker/11:1 Not tainted 
4.11.0-rc2+ #99

[  333.696517] Hardware name: Supermicro X9DRFR/X9DRFR, BIOS 1.0a 09/11/2012
[  333.696522] Workqueue: srp_remove srp_remove_work [ib_srp]
[  333.696523] Call Trace:
[  333.696530]  dump_stack+0x63/0x90
[  333.696534]  __warn+0xcb/0xf0
[  333.696536]  warn_slowpath_null+0x1d/0x20
[  333.696538]  __scsi_remove_device+0x106/0x110
[  333.696540]  scsi_forget_host+0x60/0x70
[  333.696545]  scsi_remove_host+0x77/0x110
[  333.696547]  srp_remove_work+0x90/0x230 [ib_srp]
[  333.696551]  process_one_work+0x177/0x430
[  333.696553]  worker_thread+0x4e/0x4b0
[  333.696555]  kthread+0x101/0x140
[  333.696556]  ? process_one_work+0x430/0x430
[  333.696558]  ? kthread_create_on_node+0x60/0x60
[  333.696563]  ret_from_fork+0x2c/0x40
[  333.696565] ---[ end trace b1edd584bf21aaba ]---

Israel.


On 3/14/2017 4:35 AM, Bart Van Assche wrote:

On Mon, 2017-03-13 at 14:55 -0700, James Bottomley wrote:

This is true, but I don't see how it can cause the host to be freed
before the sdev.  The memory for struct Scsi_Host is freed in the
shost_gendev release routine, which should be pinned by the parent
traversal from sdev.  So it should not be possible for
  scsi_host_dev_release() to be called before
  scsi_device_dev_release_usercontext() becase the latter has the final
put of the parent device.

Hello James,

I will run a bisect to see whether that provides more information about
what caused the change in the reference counting behavior.

Israel, since you did not hit the reference counting issue in your tests,
can you repeat your test with patches 3, 4 and 5 applied?

Thanks,

Bart.




Re: [PATCH] scsi: cxgb3i: remove redundant null check and kfree on skb

2017-03-14 Thread Dan Carpenter
On Mon, Mar 13, 2017 at 07:14:08PM +, Colin King wrote:
> From: Colin Ian King 
> 
> On the error exit path, skb is always null, so the non-null check
> and __kfree_skb call are redundant.  Remove the redundant code,
> rename the rel_release label to err and make error paths jump to
> the err exit path.
> 
> Detected by CoverityScan, CID#114328 ("Logically Dead Code")
> 
> Signed-off-by: Colin Ian King 
> ---
>  drivers/scsi/cxgbi/cxgb3i/cxgb3i.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c 
> b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
> index 1880eb6..f453a78 100644
> --- a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
> +++ b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
> @@ -972,21 +972,21 @@ static int init_act_open(struct cxgbi_sock *csk)
> >daddr.sin_addr.s_addr);
>   if (!csk->l2t) {
>   pr_err("NO l2t available.\n");
> - return -EINVAL;
> + goto err;

err is a bad name because it doesn't say what the goto does.  Imagine
if function names were so opaque.  The most readable thing is a direct
return.

Also that's the least bug prone thing.

regards,
dan carpenter



[PATCH 8/9] be2iscsi: Check size before copying ASYNC handle

2017-03-14 Thread Jitendra Bhivare
Data in buffers are gathered into a single buffer before giving to
iSCSI layer. Though less likely to have payload more than 8K in
ASYNC PDU, the data length is provide by FW and check is missing
for overrun.

Signed-off-by: Jitendra Bhivare 
---
 drivers/scsi/be2iscsi/be_main.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/be2iscsi/be_main.c b/drivers/scsi/be2iscsi/be_main.c
index ee1f1c4..4b668c4 100644
--- a/drivers/scsi/be2iscsi/be_main.c
+++ b/drivers/scsi/be2iscsi/be_main.c
@@ -1611,6 +1611,10 @@ beiscsi_hdl_fwd_pdu(struct beiscsi_conn *beiscsi_conn,
dlen = pasync_handle->buffer_len;
continue;
}
+   if (!pasync_handle->buffer_len ||
+   (dlen + pasync_handle->buffer_len) >
+   pasync_ctx->async_data.buffer_size)
+   break;
memcpy(pdata + dlen, pasync_handle->pbuffer,
   pasync_handle->buffer_len);
dlen += pasync_handle->buffer_len;
@@ -1619,8 +1623,9 @@ beiscsi_hdl_fwd_pdu(struct beiscsi_conn *beiscsi_conn,
if (!plast_handle->is_final) {
/* last handle should have final PDU notification from FW */
beiscsi_log(phba, KERN_ERR, BEISCSI_LOG_ISCSI,
-   "BM_%d : cid %u %p fwd async PDU with last handle 
missing - HL%u:DN%u:DR%u\n",
+   "BM_%d : cid %u %p fwd async PDU opcode %x with 
last handle missing - HL%u:DN%u:DR%u\n",
beiscsi_conn->beiscsi_conn_cid, plast_handle,
+   AMAP_GET_BITS(struct amap_pdu_base, opcode, phdr),
pasync_ctx->async_entry[cri].wq.hdr_len,
pasync_ctx->async_entry[cri].wq.bytes_needed,
pasync_ctx->async_entry[cri].wq.bytes_received);
-- 
2.7.4



[PATCH 2/9] be2iscsi: Fix closing of connection

2017-03-14 Thread Jitendra Bhivare
CID needs to be freed even when invalidate or upload connection fails.
Attempt to close connection 3 times before freeing CID.

Set cleanup_type to INVALIDATE instead of force TCP_RST.
This unnecessarily is terminating connection with reset instead of
gracefully closing it.

Set save_cfg to 0 - session not to be saved on flash.

Add delay and process CQ before uploading connection.

Signed-off-by: Jitendra Bhivare 
---
 drivers/scsi/be2iscsi/be.h   |   1 -
 drivers/scsi/be2iscsi/be_cmds.h  |  63 +--
 drivers/scsi/be2iscsi/be_iscsi.c |  98 +-
 drivers/scsi/be2iscsi/be_mgmt.c  | 127 ---
 drivers/scsi/be2iscsi/be_mgmt.h  |  30 ++---
 5 files changed, 159 insertions(+), 160 deletions(-)

diff --git a/drivers/scsi/be2iscsi/be.h b/drivers/scsi/be2iscsi/be.h
index ca9440f..4dd8de4 100644
--- a/drivers/scsi/be2iscsi/be.h
+++ b/drivers/scsi/be2iscsi/be.h
@@ -154,7 +154,6 @@ struct be_ctrl_info {
 #define PAGE_SHIFT_4K 12
 #define PAGE_SIZE_4K (1 << PAGE_SHIFT_4K)
 #define mcc_timeout12 /* 12s timeout */
-#define BEISCSI_LOGOUT_SYNC_DELAY  250
 
 /* Returns number of pages spanned by the data starting at the given addr */
 #define PAGES_4K_SPANNED(_address, size)   \
diff --git a/drivers/scsi/be2iscsi/be_cmds.h b/drivers/scsi/be2iscsi/be_cmds.h
index 1d40e83..88fe731 100644
--- a/drivers/scsi/be2iscsi/be_cmds.h
+++ b/drivers/scsi/be2iscsi/be_cmds.h
@@ -1145,24 +1145,49 @@ struct tcp_connect_and_offload_out {
 #define DB_DEF_PDU_EVENT_SHIFT 15
 #define DB_DEF_PDU_CQPROC_SHIFT16
 
-struct dmsg_cqe {
-   u32 dw[4];
+struct be_invalidate_connection_params_in {
+   struct be_cmd_req_hdr hdr;
+   u32 session_handle;
+   u16 cid;
+   u16 unused;
+#define BE_CLEANUP_TYPE_INVALIDATE 0x8001
+#define BE_CLEANUP_TYPE_ISSUE_TCP_RST  0x8002
+   u16 cleanup_type;
+   u16 save_cfg;
+} __packed;
+
+struct be_invalidate_connection_params_out {
+   u32 session_handle;
+   u16 cid;
+   u16 unused;
 } __packed;
 
-struct tcp_upload_params_in {
+union be_invalidate_connection_params {
+   struct be_invalidate_connection_params_in req;
+   struct be_invalidate_connection_params_out resp;
+} __packed;
+
+struct be_tcp_upload_params_in {
struct be_cmd_req_hdr hdr;
u16 id;
+#define BE_UPLOAD_TYPE_GRACEFUL1
+/* abortive upload with reset */
+#define BE_UPLOAD_TYPE_ABORT_RESET 2
+/* abortive upload without reset */
+#define BE_UPLOAD_TYPE_ABORT   3
+/* abortive upload with reset, sequence number by driver */
+#define BE_UPLOAD_TYPE_ABORT_WITH_SEQ  4
u16 upload_type;
u32 reset_seq;
 } __packed;
 
-struct tcp_upload_params_out {
+struct be_tcp_upload_params_out {
u32 dw[32];
 } __packed;
 
-union tcp_upload_params {
-   struct tcp_upload_params_in request;
-   struct tcp_upload_params_out response;
+union be_tcp_upload_params {
+   struct be_tcp_upload_params_in request;
+   struct be_tcp_upload_params_out response;
 } __packed;
 
 struct be_ulp_fw_cfg {
@@ -1243,10 +1268,7 @@ struct be_cmd_get_port_name {
 #define OPCODE_COMMON_WRITE_FLASH  96
 #define OPCODE_COMMON_READ_FLASH   97
 
-/* --- CMD_ISCSI_INVALIDATE_CONNECTION_TYPE --- */
 #define CMD_ISCSI_COMMAND_INVALIDATE   1
-#define CMD_ISCSI_CONNECTION_INVALIDATE0x8001
-#define CMD_ISCSI_CONNECTION_ISSUE_TCP_RST 0x8002
 
 #define INI_WR_CMD 1   /* Initiator write command */
 #define INI_TMF_CMD2   /* Initiator TMF command */
@@ -1269,27 +1291,6 @@ struct be_cmd_get_port_name {
 *  preparedby
 * driver should not be touched
 */
-/* --- CMD_CHUTE_TYPE --- */
-#define CMD_CONNECTION_CHUTE_0 1
-#define CMD_CONNECTION_CHUTE_1 2
-#define CMD_CONNECTION_CHUTE_2 3
-
-#define EQ_MAJOR_CODE_COMPLETION   0
-
-#define CMD_ISCSI_SESSION_DEL_CFG_FROM_FLASH 0
-#define CMD_ISCSI_SESSION_SAVE_CFG_ON_FLASH 1
-
-/* --- CONNECTION_UPLOAD_PARAMS --- */
-/* These parameters are used to define the type of upload desired.  */
-#define CONNECTION_UPLOAD_GRACEFUL  1  /* Graceful upload  */
-#define CONNECTION_UPLOAD_ABORT_RESET   2  /* Abortive upload with
-* reset
-*/
-#define CONNECTION_UPLOAD_ABORT3   /* Abortive upload 
without
-* reset
-*/
-#define CONNECTION_UPLOAD_ABORT_WITH_SEQ 4 /* Abortive upload with reset,
-* sequence number by driver  */
 
 /* 

[PATCH 1/9] be2iscsi: Check tag in beiscsi_mccq_compl_wait

2017-03-14 Thread Jitendra Bhivare
scsi host12: BS_1377 : mgmt_invalidate_connection Failed for cid=256
BUG: unable to handle kernel NULL pointer dereference at 0008
IP: [] __list_add+0xf/0xc0
PGD 0
Oops:  [#1] SMP
Modules linked in:
...
CPU: 9 PID: 1542 Comm: iscsid Tainted: G    T 
3.10.0-514.el7.x86_64 #1
Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 09/12/2016
task: 88076f310fb0 ti: 88076bba8000 task.ti: 88076bba8000
RIP: 0010:[]  [] __list_add+0xf/0xc0
RSP: 0018:88076bbab8e8  EFLAGS: 00010046
RAX: 0246 RBX: 88076bbab990 RCX: 
RDX:  RSI: 880468badf58 RDI: 88076bbab990
RBP: 88076bbab900 R08: 0246 R09: 20de
R10:  R11: 88076bbab5be R12: 
R13: 880468badf58 R14: 0001adb0 R15: 88076f310fb0
FS:  7f377124a880() GS:88046fa4() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0008 CR3: 000771318000 CR4: 001407e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Stack:
88076bbab990 880468badf50 0001 88076bbab938
810b128b 0246 cf9b7040 880468bac7a0
 880468bac7a0 88076bbab9d0 a05a6ea3

Call Trace:
[] prepare_to_wait+0x7b/0x90
[] beiscsi_mccq_compl_wait+0x153/0x330 [be2iscsi]
[] ? wake_up_atomic_t+0x30/0x30
[] beiscsi_ep_disconnect+0x91/0x2d0 [be2iscsi]
[] iscsi_if_ep_disconnect.isra.14+0x5a/0x70 
[scsi_transport_iscsi]
[] iscsi_if_recv_msg+0x113b/0x14a0 [scsi_transport_iscsi]
[] ? __kmalloc_node_track_caller+0x58/0x290
[] iscsi_if_rx+0x8e/0x1f0 [scsi_transport_iscsi]
[] netlink_unicast+0xed/0x1b0
[] netlink_sendmsg+0x31e/0x690
[] ? netlink_rcv_wake+0x44/0x60
[] ? netlink_recvmsg+0x1e3/0x450

beiscsi_mccq_compl_wait gets called even when MCC tag allocation failed
for mgmt_invalidate_connection.
mcc_wait is not initialized for tag 0 so causes crash in prepare_to_wait.

Signed-off-by: Jitendra Bhivare 
---
 drivers/scsi/be2iscsi/be_cmds.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/scsi/be2iscsi/be_cmds.c b/drivers/scsi/be2iscsi/be_cmds.c
index 5d59e263..d14ddb2 100644
--- a/drivers/scsi/be2iscsi/be_cmds.c
+++ b/drivers/scsi/be2iscsi/be_cmds.c
@@ -246,6 +246,12 @@ int beiscsi_mccq_compl_wait(struct beiscsi_hba *phba,
 {
int rc = 0;
 
+   if (!tag || tag > MAX_MCC_CMD) {
+   __beiscsi_log(phba, KERN_ERR,
+ "BC_%d : invalid tag %u\n", tag);
+   return -EINVAL;
+   }
+
if (beiscsi_hba_in_error(phba)) {
clear_bit(MCC_TAG_STATE_RUNNING,
  >ctrl.ptag_state[tag].tag_state);
-- 
2.7.4



[PATCH 7/9] be2iscsi: Remove free_list for ASYNC handles

2017-03-14 Thread Jitendra Bhivare
With previous patch adding ASYNC Rx buffers to free_list is not required.
Remove all free_list related operations.

Add in_use to track if buffer posted is being processed by driver and
purge all buffers received for connection if found so.

Signed-off-by: Jitendra Bhivare 
---
 drivers/scsi/be2iscsi/be_main.c | 99 +
 drivers/scsi/be2iscsi/be_main.h |  8 +---
 2 files changed, 43 insertions(+), 64 deletions(-)

diff --git a/drivers/scsi/be2iscsi/be_main.c b/drivers/scsi/be2iscsi/be_main.c
index 1cd9c2d..ee1f1c4 100644
--- a/drivers/scsi/be2iscsi/be_main.c
+++ b/drivers/scsi/be2iscsi/be_main.c
@@ -1453,15 +1453,28 @@ static inline void
 beiscsi_hdl_put_handle(struct hd_async_context *pasync_ctx,
 struct hd_async_handle *pasync_handle)
 {
-   if (pasync_handle->is_header) {
-   list_add_tail(_handle->link,
-   _ctx->async_header.free_list);
-   pasync_ctx->async_header.free_entries++;
-   } else {
-   list_add_tail(_handle->link,
-   _ctx->async_data.free_list);
-   pasync_ctx->async_data.free_entries++;
-   }
+   pasync_handle->is_final = 0;
+   pasync_handle->buffer_len = 0;
+   pasync_handle->in_use = 0;
+   list_del_init(_handle->link);
+}
+
+static void
+beiscsi_hdl_purge_handles(struct beiscsi_hba *phba,
+ struct hd_async_context *pasync_ctx,
+ u16 cri)
+{
+   struct hd_async_handle *pasync_handle, *tmp_handle;
+   struct list_head *plist;
+
+   plist  = _ctx->async_entry[cri].wq.list;
+   list_for_each_entry_safe(pasync_handle, tmp_handle, plist, link)
+   beiscsi_hdl_put_handle(pasync_ctx, pasync_handle);
+
+   INIT_LIST_HEAD(_ctx->async_entry[cri].wq.list);
+   pasync_ctx->async_entry[cri].wq.hdr_len = 0;
+   pasync_ctx->async_entry[cri].wq.bytes_received = 0;
+   pasync_ctx->async_entry[cri].wq.bytes_needed = 0;
 }
 
 static struct hd_async_handle *
@@ -1473,11 +1486,12 @@ beiscsi_hdl_get_handle(struct beiscsi_conn 
*beiscsi_conn,
struct beiscsi_hba *phba = beiscsi_conn->phba;
struct hd_async_handle *pasync_handle;
struct be_bus_address phys_addr;
+   u16 cid, code, ci, cri;
u8 final, error = 0;
-   u16 cid, code, ci;
u32 dpl;
 
cid = beiscsi_conn->beiscsi_conn_cid;
+   cri = BE_GET_ASYNC_CRI_FROM_CID(cid);
/**
 * This function is invoked to get the right async_handle structure
 * from a given DEF PDU CQ entry.
@@ -1525,15 +1539,7 @@ beiscsi_hdl_get_handle(struct beiscsi_conn *beiscsi_conn,
break;
/* called only for above codes */
default:
-   pasync_handle = NULL;
-   break;
-   }
-
-   if (!pasync_handle) {
-   beiscsi_log(phba, KERN_ERR, BEISCSI_LOG_ISCSI,
-   "BM_%d : cid %d async PDU handle not found - code 
%d ci %d addr %llx\n",
-   cid, code, ci, phys_addr.u.a64.address);
-   return pasync_handle;
+   return NULL;
}
 
if (pasync_handle->pa.u.a64.address != phys_addr.u.a64.address ||
@@ -1549,44 +1555,33 @@ beiscsi_hdl_get_handle(struct beiscsi_conn 
*beiscsi_conn,
/* FW has stale address - attempt continuing by dropping */
}
 
-   list_del_init(_handle->link);
-   /**
-* Each CID is associated with unique CRI.
-* ASYNC_CRI_FROM_CID mapping and CRI_FROM_CID are totaly different.
-**/
-   pasync_handle->cri = BE_GET_ASYNC_CRI_FROM_CID(cid);
-   pasync_handle->is_final = final;
-   pasync_handle->buffer_len = dpl;
-
/**
 * DEF PDU header and data buffers with errors should be simply
 * dropped as there are no consumers for it.
 */
if (error) {
beiscsi_hdl_put_handle(pasync_ctx, pasync_handle);
-   pasync_handle = NULL;
+   return NULL;
}
-   return pasync_handle;
-}
-
-static void
-beiscsi_hdl_purge_handles(struct beiscsi_hba *phba,
- struct hd_async_context *pasync_ctx,
- u16 cri)
-{
-   struct hd_async_handle *pasync_handle, *tmp_handle;
-   struct list_head *plist;
 
-   plist  = _ctx->async_entry[cri].wq.list;
-   list_for_each_entry_safe(pasync_handle, tmp_handle, plist, link) {
-   list_del(_handle->link);
-   beiscsi_hdl_put_handle(pasync_ctx, pasync_handle);
+   if (pasync_handle->in_use || !list_empty(_handle->link)) {
+   beiscsi_log(phba, KERN_ERR, BEISCSI_LOG_ISCSI,
+   "BM_%d : cid %d async PDU handle in use - code %d 
ci %d addr %llx\n",
+   cid, code, ci, phys_addr.u.a64.address);
+   

[PATCH 9/9] be2iscsi: Update driver version

2017-03-14 Thread Jitendra Bhivare
Version 11.4.0.0

Signed-off-by: Jitendra Bhivare 
---
 drivers/scsi/be2iscsi/be_main.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/be2iscsi/be_main.h b/drivers/scsi/be2iscsi/be_main.h
index 9883a85..0f826fe 100644
--- a/drivers/scsi/be2iscsi/be_main.h
+++ b/drivers/scsi/be2iscsi/be_main.h
@@ -36,7 +36,7 @@
 #include 
 
 #define DRV_NAME   "be2iscsi"
-#define BUILD_STR  "11.2.1.0"
+#define BUILD_STR  "11.4.0.0"
 #define BE_NAME"Emulex OneConnect" \
"Open-iSCSI Driver version" BUILD_STR
 #define DRV_DESC   BE_NAME " " "Driver"
-- 
2.7.4



[PATCH 0/9] be2iscsi: driver update 11.4.0.0

2017-03-14 Thread Jitendra Bhivare
This patch is generated against for-next branch.

Jitendra Bhivare (9):
  be2iscsi: Check tag in beiscsi_mccq_compl_wait
  be2iscsi: Fix closing of connection
  be2iscsi: Replace spin_unlock_bh with spin_lock
  scsi_transport_iscsi: Use flush_work in iscsi_remove_session
  be2iscsi: Increase HDQ default queue size
  be2iscsi: Use num_cons field in Rx CQE
  be2iscsi: Remove free_list for ASYNC handles
  be2iscsi: Check size before copying ASYNC handle
  be2iscsi: Update driver version

 drivers/scsi/be2iscsi/be.h  |   1 -
 drivers/scsi/be2iscsi/be_cmds.c |   6 +
 drivers/scsi/be2iscsi/be_cmds.h |  63 -
 drivers/scsi/be2iscsi/be_iscsi.c|  98 --
 drivers/scsi/be2iscsi/be_main.c | 257 
 drivers/scsi/be2iscsi/be_main.h |  15 +--
 drivers/scsi/be2iscsi/be_mgmt.c | 127 +-
 drivers/scsi/be2iscsi/be_mgmt.h |  30 +
 drivers/scsi/scsi_transport_iscsi.c |   3 +-
 9 files changed, 283 insertions(+), 317 deletions(-)

-- 
2.7.4



[PATCH 6/9] be2iscsi: Use num_cons field in Rx CQE

2017-03-14 Thread Jitendra Bhivare
FW runs out of buffer if buffers are not posted back soon.
ASYNC Rx CQE indicates that FW has consumed 8 RQEs.
Use it to post back buffers instead of waiting for buffers
to be processed and freed by driver.

Signed-off-by: Jitendra Bhivare 
---
 drivers/scsi/be2iscsi/be_main.c | 124 
 drivers/scsi/be2iscsi/be_main.h |   1 +
 2 files changed, 51 insertions(+), 74 deletions(-)

diff --git a/drivers/scsi/be2iscsi/be_main.c b/drivers/scsi/be2iscsi/be_main.c
index b76fd26..1cd9c2d 100644
--- a/drivers/scsi/be2iscsi/be_main.c
+++ b/drivers/scsi/be2iscsi/be_main.c
@@ -1467,7 +1467,8 @@ beiscsi_hdl_put_handle(struct hd_async_context 
*pasync_ctx,
 static struct hd_async_handle *
 beiscsi_hdl_get_handle(struct beiscsi_conn *beiscsi_conn,
   struct hd_async_context *pasync_ctx,
-  struct i_t_dpdu_cqe *pdpdu_cqe)
+  struct i_t_dpdu_cqe *pdpdu_cqe,
+  u8 *header)
 {
struct beiscsi_hba *phba = beiscsi_conn->phba;
struct hd_async_handle *pasync_handle;
@@ -1515,6 +1516,7 @@ beiscsi_hdl_get_handle(struct beiscsi_conn *beiscsi_conn,
switch (code) {
case UNSOL_HDR_NOTIFY:
pasync_handle = pasync_ctx->async_entry[ci].header;
+   *header = 1;
break;
case UNSOL_DATA_DIGEST_ERROR_NOTIFY:
error = 1;
@@ -1547,6 +1549,7 @@ beiscsi_hdl_get_handle(struct beiscsi_conn *beiscsi_conn,
/* FW has stale address - attempt continuing by dropping */
}
 
+   list_del_init(_handle->link);
/**
 * Each CID is associated with unique CRI.
 * ASYNC_CRI_FROM_CID mapping and CRI_FROM_CID are totaly different.
@@ -1554,11 +1557,6 @@ beiscsi_hdl_get_handle(struct beiscsi_conn *beiscsi_conn,
pasync_handle->cri = BE_GET_ASYNC_CRI_FROM_CID(cid);
pasync_handle->is_final = final;
pasync_handle->buffer_len = dpl;
-   /* empty the slot */
-   if (pasync_handle->is_header)
-   pasync_ctx->async_entry[ci].header = NULL;
-   else
-   pasync_ctx->async_entry[ci].data = NULL;
 
/**
 * DEF PDU header and data buffers with errors should be simply
@@ -1708,85 +1706,53 @@ beiscsi_hdl_gather_pdu(struct beiscsi_conn 
*beiscsi_conn,
 
 static void
 beiscsi_hdq_post_handles(struct beiscsi_hba *phba,
-u8 header, u8 ulp_num)
+u8 header, u8 ulp_num, u16 nbuf)
 {
-   struct hd_async_handle *pasync_handle, *tmp, **slot;
+   struct hd_async_handle *pasync_handle;
struct hd_async_context *pasync_ctx;
struct hwi_controller *phwi_ctrlr;
-   struct list_head *hfree_list;
struct phys_addr *pasync_sge;
u32 ring_id, doorbell = 0;
u32 doorbell_offset;
-   u16 prod = 0, cons;
-   u16 index;
+   u16 prod, pi;
 
phwi_ctrlr = phba->phwi_ctrlr;
pasync_ctx = HWI_GET_ASYNC_PDU_CTX(phwi_ctrlr, ulp_num);
if (header) {
-   cons = pasync_ctx->async_header.free_entries;
-   hfree_list = _ctx->async_header.free_list;
+   pasync_sge = pasync_ctx->async_header.ring_base;
+   pi = pasync_ctx->async_header.pi;
ring_id = phwi_ctrlr->default_pdu_hdr[ulp_num].id;
doorbell_offset = phwi_ctrlr->default_pdu_hdr[ulp_num].
doorbell_offset;
} else {
-   cons = pasync_ctx->async_data.free_entries;
-   hfree_list = _ctx->async_data.free_list;
+   pasync_sge = pasync_ctx->async_data.ring_base;
+   pi = pasync_ctx->async_data.pi;
ring_id = phwi_ctrlr->default_pdu_data[ulp_num].id;
doorbell_offset = phwi_ctrlr->default_pdu_data[ulp_num].
doorbell_offset;
}
-   /* number of entries posted must be in multiples of 8 */
-   if (cons % 8)
-   return;
 
-   list_for_each_entry_safe(pasync_handle, tmp, hfree_list, link) {
-   list_del_init(_handle->link);
-   pasync_handle->is_final = 0;
-   pasync_handle->buffer_len = 0;
-
-   /* handles can be consumed out of order, use index in handle */
-   index = pasync_handle->index;
-   WARN_ON(pasync_handle->is_header != header);
+   for (prod = 0; prod < nbuf; prod++) {
if (header)
-   slot = _ctx->async_entry[index].header;
+   pasync_handle = pasync_ctx->async_entry[pi].header;
else
-   slot = _ctx->async_entry[index].data;
-   /**
-* The slot just tracks handle's hold and release, so
-* overwriting at the same index won't do any harm but
-* needs to be caught.
-

[PATCH 4/9] scsi_transport_iscsi: Use flush_work in iscsi_remove_session

2017-03-14 Thread Jitendra Bhivare
scsi_flush_work flushes workqueue for the Scsi_Host.
In iSCSI offload enabled host, this would wait for all other
sessions under the host.

Use flush_work for the session being removed instead.

Signed-off-by: Jitendra Bhivare 
---
 drivers/scsi/scsi_transport_iscsi.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_transport_iscsi.c 
b/drivers/scsi/scsi_transport_iscsi.c
index 568c9f2..a424eae 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -2158,7 +2158,6 @@ static int iscsi_iter_destroy_conn_fn(struct device *dev, 
void *data)
 
 void iscsi_remove_session(struct iscsi_cls_session *session)
 {
-   struct Scsi_Host *shost = iscsi_session_to_shost(session);
unsigned long flags;
int err;
 
@@ -2185,7 +2184,7 @@ void iscsi_remove_session(struct iscsi_cls_session 
*session)
 
scsi_target_unblock(>dev, SDEV_TRANSPORT_OFFLINE);
/* flush running scans then delete devices */
-   scsi_flush_work(shost);
+   flush_work(>scan_work);
__iscsi_unbind_session(>unbind_work);
 
/* hw iscsi may not have removed all connections from session */
-- 
2.7.4



[PATCH 5/9] be2iscsi: Increase HDQ default queue size

2017-03-14 Thread Jitendra Bhivare
Currently, ASYNC PDU default queue size is set to max connections.
This leaves only one buffer per connection for any ASYNC PDUs from
targets.

Double the size of the default queue.

Signed-off-by: Jitendra Bhivare 
---
 drivers/scsi/be2iscsi/be_main.c | 29 ++---
 drivers/scsi/be2iscsi/be_main.h |  4 +++-
 2 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/be2iscsi/be_main.c b/drivers/scsi/be2iscsi/be_main.c
index ff48573..b76fd26 100644
--- a/drivers/scsi/be2iscsi/be_main.c
+++ b/drivers/scsi/be2iscsi/be_main.c
@@ -636,7 +636,6 @@ static void beiscsi_get_params(struct beiscsi_hba *phba)
(total_cid_count +
 BE2_TMFS + BE2_NOPOUT_REQ));
phba->params.cxns_per_ctrl = total_cid_count;
-   phba->params.asyncpdus_per_ctrl = total_cid_count;
phba->params.icds_per_ctrl = total_icd_count;
phba->params.num_sge_per_io = BE2_SGE;
phba->params.defpdu_hdr_sz = BE2_DEFPDU_HDR_SZ;
@@ -2407,22 +2406,22 @@ static void beiscsi_find_mem_req(struct beiscsi_hba 
*phba)
if (test_bit(ulp_num, >fw_config.ulp_supported)) {
 
num_async_pdu_buf_sgl_pages =
-   PAGES_REQUIRED(BEISCSI_GET_CID_COUNT(
+   PAGES_REQUIRED(BEISCSI_ASYNC_HDQ_SIZE(
   phba, ulp_num) *
   sizeof(struct phys_addr));
 
num_async_pdu_buf_pages =
-   PAGES_REQUIRED(BEISCSI_GET_CID_COUNT(
+   PAGES_REQUIRED(BEISCSI_ASYNC_HDQ_SIZE(
   phba, ulp_num) *
   phba->params.defpdu_hdr_sz);
 
num_async_pdu_data_pages =
-   PAGES_REQUIRED(BEISCSI_GET_CID_COUNT(
+   PAGES_REQUIRED(BEISCSI_ASYNC_HDQ_SIZE(
   phba, ulp_num) *
   phba->params.defpdu_data_sz);
 
num_async_pdu_data_sgl_pages =
-   PAGES_REQUIRED(BEISCSI_GET_CID_COUNT(
+   PAGES_REQUIRED(BEISCSI_ASYNC_HDQ_SIZE(
   phba, ulp_num) *
   sizeof(struct phys_addr));
 
@@ -2459,21 +2458,21 @@ static void beiscsi_find_mem_req(struct beiscsi_hba 
*phba)
mem_descr_index = (HWI_MEM_ASYNC_HEADER_HANDLE_ULP0 +
  (ulp_num * MEM_DESCR_OFFSET));
phba->mem_req[mem_descr_index] =
- BEISCSI_GET_CID_COUNT(phba, ulp_num) *
- sizeof(struct hd_async_handle);
+   BEISCSI_ASYNC_HDQ_SIZE(phba, ulp_num) *
+   sizeof(struct hd_async_handle);
 
mem_descr_index = (HWI_MEM_ASYNC_DATA_HANDLE_ULP0 +
  (ulp_num * MEM_DESCR_OFFSET));
phba->mem_req[mem_descr_index] =
- BEISCSI_GET_CID_COUNT(phba, ulp_num) *
- sizeof(struct hd_async_handle);
+   BEISCSI_ASYNC_HDQ_SIZE(phba, ulp_num) *
+   sizeof(struct hd_async_handle);
 
mem_descr_index = (HWI_MEM_ASYNC_PDU_CONTEXT_ULP0 +
  (ulp_num * MEM_DESCR_OFFSET));
phba->mem_req[mem_descr_index] =
- sizeof(struct hd_async_context) +
-(BEISCSI_GET_CID_COUNT(phba, ulp_num) *
- sizeof(struct hd_async_entry));
+   sizeof(struct hd_async_context) +
+   (BEISCSI_ASYNC_HDQ_SIZE(phba, ulp_num) *
+sizeof(struct hd_async_entry));
}
}
 }
@@ -2757,7 +2756,7 @@ static int hwi_init_async_pdu_ctx(struct beiscsi_hba 
*phba)
((long unsigned int)pasync_ctx +
sizeof(struct hd_async_context));
 
-   pasync_ctx->num_entries = BEISCSI_GET_CID_COUNT(phba,
+   pasync_ctx->num_entries = BEISCSI_ASYNC_HDQ_SIZE(phba,
  ulp_num);
/* setup header buffers */
mem_descr = (struct be_mem_descriptor *)phba->init_mem;
@@ -2895,7 +2894,7 @@ static int hwi_init_async_pdu_ctx(struct beiscsi_hba 
*phba)

[PATCH 3/9] be2iscsi: Replace spin_unlock_bh with spin_lock

2017-03-14 Thread Jitendra Bhivare
spin_unlock_bh back_lock is used in beiscsi_eh_device_reset instead of
spin_lock.

Signed-off-by: Jitendra Bhivare 
---
 drivers/scsi/be2iscsi/be_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/be2iscsi/be_main.c b/drivers/scsi/be2iscsi/be_main.c
index 32b2713..ff48573 100644
--- a/drivers/scsi/be2iscsi/be_main.c
+++ b/drivers/scsi/be2iscsi/be_main.c
@@ -337,7 +337,7 @@ static int beiscsi_eh_device_reset(struct scsi_cmnd *sc)
inv_tbl->task[nents] = task;
nents++;
}
-   spin_unlock_bh(>back_lock);
+   spin_unlock(>back_lock);
spin_unlock_bh(>frwd_lock);
 
rc = SUCCESS;
-- 
2.7.4