RE: [PATCH net-next] qed*: Utilize Firmware 8.15.3.0

2017-03-09 Thread Mintz, Yuval
> > We can't just require a new firmware version in the driver, as users
> > most likely won't have it by the time they install the new kernel.  So
> > you'll have to support the old firmware version as well.
> 
> Why not? That has been the paradigm forever.
> 
> The new firmware version is already available in linux-firmware.
> In any reasonable distro that would update the driver, you'd expect they'd
> also update the firmware version on their filesystem.

Just in case it wasn't clear from the original message - 
The firmware discussed here is the binary firmware, not the
management firmware.
For the management firmware [on persistent storage] there is
a backward/forward compatibility scheme in place.


RE: [PATCH net-next] qed*: Utilize Firmware 8.15.3.0

2017-03-09 Thread Mintz, Yuval
> We can't just require a new firmware version in the driver, as users most
> likely won't have it by the time they install the new kernel.  So you'll have 
> to
> support the old firmware version as well.

Why not? That has been the paradigm forever.

The new firmware version is already available in linux-firmware.
In any reasonable distro that would update the driver, you'd expect
they'd also update the firmware version on their filesystem.



[PATCH] libata: make ata_sg_clean static over again

2017-03-09 Thread Jason Yan
Fixes the following sparse warning:

drivers/ata/libata-core.c:4913:6: warning: symbol 'ata_sg_clean' was not
declared. Should it be static?

Signed-off-by: Jason Yan 
---
 drivers/ata/libata-core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index ca75823..5d33699 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4910,7 +4910,7 @@ void ata_sg_init(struct ata_queued_cmd *qc, struct 
scatterlist *sg,
  * LOCKING:
  * spin_lock_irqsave(host lock)
  */
-void ata_sg_clean(struct ata_queued_cmd *qc)
+static void ata_sg_clean(struct ata_queued_cmd *qc)
 {
struct ata_port *ap = qc->ap;
struct scatterlist *sg = qc->sg;
-- 
2.5.0



Re: how to unmap pages in an anonymous mmap?

2017-03-09 Thread Xiubo Li



On 2017年02月28日 03:32, Andy Grover wrote:

On 02/26/2017 09:59 PM, Xiubo Li wrote:

But, We likely don't want to release memory from the data area anyways
while active, in any case. How about if we set a timer when active
commands go to zero, and then reduce data area to some minimum if no new
cmds come in before timer expires?

If I understand correctly: for example, we have 1G(as the minimum)
data area and all blocks have been allocated and mapped to runner's
vma, then we extern it to 1G + 256M as needed. When there have no
active cmds and after the timer expires, will it reduce the data area
back to 1G ? And then should it release the reduced 256M data area's
memories ?

If so, after kfree()ed the blocks' memories, it should also try to remove
all the ptes which are mapping this page(like using the try_to_umap()),
but something like try_to_umap() doesn't export for the modules.

Without ummaping the kfree()ed pages' ptes mentioned above, then
the reduced 256M vma space couldn't be reused again for the runner
process, because the runner has already do the mapping for the reduced
vma space to some old physical pages(won't trigger new page fault
again). Then there will be a hole, and the hole will be bigger and bigger.

Without ummaping the kfree()ed pages' ptes mentioned above, the
pages' reference count (page_ref_dec(), which _inc()ed in page fault)
couldn't be reduced back too.

Let's ask people who will know...

Hi linux-mm,

TCM-User (drivers/target/target_core_user.c) currently uses vmalloc()ed
memory to back a ring buffer that is mmaped by userspace.

We want to move to dynamically mapping pages into this region, and also
we'd like to unmap/free pages when idle. What's the right way to unmap?
I see unmap_mapping_range() but that mentions an underlying file, which
TCMU doesn't have. Or maybe zap_page_range()? But it's not exported.

Hi linux-mm

For the TCMU case, the vm is not anonymous mapping. And still has
device file desc:

mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_SHARED, dev->fd, 0);

If using the unmap_mapping_range() to do the dynamically maping,
is it okay ? Any other potential risks ?

Or the mentioned 'underlying file' is must one desk file ?

Thanks very much,

BRs
Xiubo




Any advice?

Thanks in advance -- Regards -- Andy






[Bug 194837] VM with virtio-scsi drive often crashes during boot with kernel 4.11rc1

2017-03-09 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=194837

--- Comment #2 from Adam Williamson (ad...@happyassassin.net) ---
I seem to get a different traceback every time this happens in local testing,
aside from the times it just hangs - but it always seems to hang or crash at
approximately the same point in the boot process, right after the "virtio-pci
:00:0a.0: virtio_pci: leaving for legacy driver" messages. The last one I
got was:

[1.327442] Kernel panic - not syncing: CRED: put_cred_rcu() sees
938bb584c000 with usage 1953655158
[1.327442] 
[1.328949] CPU: 1 PID: 21 Comm: rcuos/1 Not tainted
4.11.0-0.rc1.git0.1.fc26.x86_64 #1
[1.328949] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Bochs 01/01/2011
[1.331157] Call Trace:
[1.331157]  dump_stack+0x63/0x84
[1.331157]  panic+0xe4/0x22d
[1.331157]  ? __prepare_to_swait+0x52/0x80
[1.331157]  put_cred_rcu+0xb0/0xb0
[1.331157]  rcu_nocb_kthread+0x15f/0x500
[1.335141]  ? get_state_synchronize_sched+0x20/0x20
[1.335141]  kthread+0x11e/0x140
[1.335141]  ? kthread_park+0x90/0x90
[1.335141]  ret_from_fork+0x2c/0x40
[1.335141] Kernel Offset: 0x3400 from 0x8100 (relocation
range: 0x8000-0xbfff)
[1.335141] ---[ end Kernel panic - not syncing: CRED: put_cred_rcu() sees
938bb584c000 with usage 1953655158

aside from that I've seen:

[1.419027] Call Trace:
[1.435026]  
[1.435026]  mpage_end_io+0x39/0x60
[1.435026]  bio_endio+0x54/0x60
[1.435026]  blk_update_request+0x8d/0x340
[1.435026]  blk_mq_end_request+0x1a/0x80
[1.435026]  virtblk_request_done+0x45/0x80 [virtio_blk]
[1.435026]  __blk_mq_complete_request+0xc9/0x120
[1.435026]  blk_mq_complete_request+0x1e/0x20
[1.435026]  virtblk_done+0x74/0x100 [virtio_blk]
[1.435026]  vring_interrupt+0x34/0x80 [virtio_ring]
[1.435026]  __handle_irq_event_percpu+0x3f/0x1a0
[1.435026]  handle_irq_event_percpu+0x32/0x80
[1.435026]  handle_irq_event+0x2c/0x50
[1.435026]  handle_edge_irq+0x6f/0x120
[1.435026]  handle_irq+0xad/0x130
[1.435026]  do_IRQ+0x46/0xd0
[1.435026]  common_interrupt+0x93/0x93
[1.435026] RIP: 0010:native_safe_halt+0x6/0x10
[1.435026] RSP: 0018:aae03de0 EFLAGS: 0246 ORIG_RAX:
ff7e
[1.435026] RAX:  RBX: aae104c0 RCX:

[1.435026] RDX:  RSI:  RDI:

[1.435026] RBP: aae03de0 R08: 1c32e304 R09:
883d767b1f00
[1.435026] R10:  R11:  R12:

[1.435026] R13: aae104c0 R14:  R15:

[1.435026]  
[1.435026]  default_idle+0x20/0xe0
[1.435026]  arch_cpu_idle+0xf/0x20
[1.435026]  default_idle_call+0x23/0x30
[1.435026]  do_idle+0x170/0x200
[1.435026]  cpu_startup_entry+0x71/0x80
[1.435026]  rest_init+0x77/0x80
[1.435026]  start_kernel+0x45a/0x47b
[1.435026]  ? early_idt_handler_array+0x120/0x120
[1.435026]  x86_64_start_reservations+0x24/0x26
[1.435026]  x86_64_start_kernel+0x13c/0x15f
[1.435026]  start_cpu+0x14/0x14
[1.435026] Code: ff c3 b8 f8 ff ff ff c3 66 0f 1f 84 00 00 00 00 00 66 66
66 66 90 55 48 89 e5 53 48 89 fb 48 83 ec 08 40 84 f6 75 12 85 d2 75 21 <48> 8b
47 20 a8 01 75 66 f0 80 0f 08 eb 2a 85 d2 75 35 48 89 df 
[1.435026] RIP: page_endio+0x1a/0xa0 RSP: 883dbfc03d68
[1.435026] CR2: 0020
[1.435026] ---[ end trace cf261be33b5d0d05 ]---
[1.435026] Kernel panic - not syncing: Fatal exception in interrupt
[1.435026] Kernel Offset: 0x2900 from 0x8100 (relocation
range: 0x8000-0xbfff)
[1.435026] ---[ end Kernel panic - not syncing: Fatal exception in
interrupt

and:

[1.530432] general protection fault:  [#1] SMP
[1.531004] Modules linked in: virtio_scsi(+) virtio_console qxl
drm_kms_helper ttm drm crc32c_intel sym53c8xx serio_raw scsi_transport_spi
virtio_pci virtio_ring virtio ata_generic pata_acpi qemu_fw_cfg
[1.531004] CPU: 0 PID: 314 Comm: systemd-udevd Not tainted
4.11.0-0.rc1.git0.1.fc26.x86_64 #1
[1.531004] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Bochs 01/01/2011
[1.531004] task: 9c49760c task.stack: bfbcc05d8000
[1.531004] RIP: 0010:kmem_cache_alloc+0x84/0x1b0
[1.531004] RSP: 0018:bfbcc05dbe60 EFLAGS: 00010246
[1.531004] RAX: 2d316f6974726976 RBX: 014000c0 RCX:
7fcf9e251b30
[1.531004] RDX: 2192 RSI: 014000c0 RDI:
0001d780
[1.531004] RBP: bfbcc05dbe90 R08: 9c49bfc1d780 R09:
0077
[1.531004] R10:  R11:  R12:
014000c0
[1.531004] R13: 9c49bd098140 R14: 2d316f6974726976 R15:
9c49bd098140
[1.531004] FS:  7fcf9f3398c0() GS:9c49bfc0()

[Bug 194837] VM with virtio-scsi drive often crashes during boot with kernel 4.11rc1

2017-03-09 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=194837

Adam Williamson (ad...@happyassassin.net) changed:

   What|Removed |Added

 Regression|No  |Yes

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 194837] VM with virtio-scsi drive often crashes during boot with kernel 4.11rc1

2017-03-09 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=194837

Adam Williamson (ad...@happyassassin.net) changed:

   What|Removed |Added

   Tree|Mainline|Fedora

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 194837] VM with virtio-scsi drive often crashes during boot with kernel 4.11rc1

2017-03-09 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=194837

--- Comment #1 from Adam Williamson (ad...@happyassassin.net) ---
Downstream bug is https://bugzilla.redhat.com/show_bug.cgi?id=1430043 , Fedora
kernel maintainers can provide info on the delta between the affected kernel
builds, I hope. The changelog just says:

* Mon Mar 06 2017 Laura Abbott - 4.11.0-0.rc1.git0.1
- Linux v4.11-rc1

* Mon Mar 06 2017 Laura Abbott
- Disable debugging options.

* Fri Mar 03 2017 Laura Abbott - 4.11.0-0.rc0.git9.1
- Linux v4.10-11319-gc82be9d

so the delta is to 'v4.10-11319-gc82be9d', but I dunno what that means exactly.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 194837] New: VM with virtio-scsi drive often crashes during boot with kernel 4.11rc1

2017-03-09 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=194837

Bug ID: 194837
   Summary: VM with virtio-scsi drive often crashes during boot
with kernel 4.11rc1
   Product: IO/Storage
   Version: 2.5
Kernel Version: 4.11rc1
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: blocking
  Priority: P1
 Component: SCSI
  Assignee: linux-scsi@vger.kernel.org
  Reporter: ad...@happyassassin.net
Regression: No

In Fedora, we use the openQA automated test system, which runs qemu VMs and
does tests in them. By default, it attaches optical disc images to the test VM
using a virtio-scsi drive.

When Fedora 26 and Rawhide's kernels went from kernel-4.11.0-0.rc0.git9.1 to
kernel-4.11.0-0.rc1.git0.1 , many openQA tests suddenly started failing because
at some point in the test, the VM would fail to boot properly, with a kernel
error and traceback often displayed (sometimes the screen would just be blank).
I've seen three variants on this failure so far. Two have identical-looking
tracebacks but a slightly different error message:

https://openqa.fedoraproject.org/tests/60571#step/_console_wait_login/7
https://openqa.fedoraproject.org/tests/60572#step/_console_wait_login/6

note that one error is 'unable to handle kernel paging request' and the other
is 'unable to handle kernel NULL pointer dereference', but the tracebacks look
very similar.

Another case shows a somewhat different traceback:

https://openqa.fedoraproject.org/tests/60574#step/_console_wait_login/4

but doesn't show an error message (it may just be in the backscroll,
unfortunately there's no way to recover it from that test now). The traceback
is still in SCSI code, however.

I can reproduce this problem manually using virt-manager, so long as I attach a
SCSI optical drive to the VM (Add Hardware, Device type -> "CDROM device", Bus
type -> "SCSI"). If I use an IDE optical drive (which is the default in
virt-manager), the bug does not occur. So long as a SCSI optical drive is
attached, about half of the attempts to boot the system with the affected
kernel fail. Usually with a traceback looking like the ones from openQA,
sometimes it also just apparently hangs when enumerating SCSI devices (I think
that's what it's doing, the last line is 'scsi 3:0:0:0: CD-ROM QEMU   QEMU
CD-ROM   2.0. PQ: 0 ANSI: 5' or a bit after that).

-- 
You are receiving this mail because:
You are the assignee for the bug.


Re: [PATCH net-next] qed*: Utilize Firmware 8.15.3.0

2017-03-09 Thread Christoph Hellwig
We can't just require a new firmware version in the driver, as
users most likely won't have it by the time they install the new
kernel.  So you'll have to support the old firmware version as well.


Re: [PATCH v3 00/14] qla2xxx: Bug Fixes and updates for target.

2017-03-09 Thread Malavali, Giridhar


On 3/8/17, 7:42 PM, "Nicholas A. Bellinger"  wrote:

>Hi Giri,
>
>On Wed, 2017-03-08 at 18:30 +, Malavali, Giridhar wrote:
>> 
>> On 3/8/17, 7:20 AM, "Bart Van Assche" 
>>wrote:
>> 
>> >On Tue, 2017-03-07 at 23:34 -0800, Nicholas A. Bellinger wrote:
>> >> Btw, the regression reported here in v2:
>> >> 
>> >> http://www.spinics.net/lists/target-devel/msg14348.html
>> >> 
>> >> is completely different from what you've reported here.
>> >
>> >The call traces differ but the root cause is probably the same.
>> >
>> >> It would be useful to explain how you reproduced this, instead of
>>just
>> >> posting backtrace with zero context..?
>> >> 
>> >> Can we at least identify which patch in this series is causing
>>this..?
>> >> 
>> >> Also, I assume you are running this on stock v4.11-rc1 with only this
>> >> qla2xxx series applied, and not all of your other stuff, right..?
>> >
>> >The test I ran against v4.11-rc1 + this patch series is to start LIO
>>on a
>> >system equipped with two back-to-back connected QLogic FC HBAs (no
>>switch
>> >inbetween), to load the tcm_qla2xxx driver, to configure LUNs and to
>>wait
>> >until the SCSI stack reports that these LUNs have appeared. What I see
>>in
>> >the lsscsi output with both v2 and v3 of this patch series is that
>>these
>> >LUNs appear briefly and then disappear and that a little bit later the
>> >kernel reports that a hang occurred. Without this patch series the LUNs
>> >are
>> >detected and do not disappear automatically and no hang is reported. I
>> >think the next step is that Cavium verifies whether they can reproduce
>> >this
>> >behavior and if they can reproduce it to run a bisect. BTW, since there
>> >are
>> >login-related patches in this series I wouldn't be surprised if one of
>> >these
>> >patches introduced the regression.
>> 
>> We generally go through switch. We will try to reproduce with back to
>>back
>> and bisect the patches.
>> 
>
>Just a heads up given this series mixes bug-fixes with a few other
>miscellaneous improvements, I'd like to get it pushed to Linus with the
>next round of patches no later than -rc3.
>
>If it's beyond -rc3, then the bug-fixes will need to be split out for
>v4.11-rc separate from the other improvements.

Thanks for the heads-up. We are working on Bart reported issue. We are
able to recreate this internally.
We should be able to wrap it up in next few days and send the final
patches. I don¹t think anything else is outstanding.

‹ Giri

>



Re: [PATCH v2] scsi_sysfs: fix hang when removing scsi device

2017-03-09 Thread Bart Van Assche
On Thu, 2017-03-09 at 18:37 +0200, Israel Rukshin wrote:
> The bug reproduce when unloading srp module with one port down.
> sd_shutdown() hangs when __scsi_remove_device() get scsi_device with
> state SDEV_OFFLINE or SDEV_TRANSPORT_OFFLINE.
> It hangs because sd_shutdown() is trying to send sync cache command
> when the device is offline but with SDEV_CANCEL status.
> The status was changed to SDEV_CANCEL by __scsi_remove_device()
> before it calls to device_del().
> 
> The block layer timeout mechanism doesn't cause the SYNCHRONIZE CACHE
> command to fail after the timeout expired because the request timer
> wasn't started.
> blk_peek_request() that is called from scsi_request_fn() didn't return
> this request and therefore the request timer didn't start.
> 
> This commit doesn't accept new commands if the original state was offline.
> 
> The bug was revealed after commit cff549 ("scsi: proper state checking
> and module refcount handling in scsi_device_get").
> After this commit scsi_device_get() returns error if the device state
> is SDEV_CANCEL.
> This eventually leads SRP fast I/O failure timeout handler not to clean
> the sync cache command because scsi_target_unblock() skip the canceled device.
> If this timeout handler is set to infinity then the hang remains forever
> also before commit cff549.

How could blk_peek_request() not return a request that has not yet been
started? How could a patch that changes scsi_device_get() affect I/O since
scsi_device_get() is not called from the I/O path? Anyway, could you try to
reproduce the hang with the patch below applied and see whether the output
produced by this patch helps to determine what is going on?

Thanks,

Bart.

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ba2286652ff6..855548ff4c4d 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -3018,8 +3018,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
    else
    sdev->sdev_state = SDEV_CREATED;
    } else if (sdev->sdev_state != SDEV_CANCEL &&
-    sdev->sdev_state != SDEV_OFFLINE)
+    sdev->sdev_state != SDEV_OFFLINE) {
+   WARN_ONCE(true, "sdev state = %d\n", sdev->sdev_state);
    return -EINVAL;
+   }
 
    if (q->mq_ops) {
    blk_mq_start_stopped_hw_queues(q, false);
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 82dfe07b1d47..35aa6b37e199 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1289,6 +1289,13 @@ void __scsi_remove_device(struct scsi_device *sdev)
    device_unregister(>sdev_dev);
    transport_remove_device(dev);
    scsi_dh_remove_device(sdev);
+
+   WARN_ON_ONCE(blk_queue_stopped(sdev->request_queue));
+   sdev_printk(KERN_INFO, sdev,
+   "%s: device_busy = %d device_blocked = %d\n",
+   __func__, atomic_read(>device_busy),
+   atomic_read(>device_blocked));
+
    device_del(dev);
    } else
    put_device(>sdev_dev);
-- 
2.12.0


Re: [PATCH linux-firmware] qed: Add firmware 8.15.3.0

2017-03-09 Thread Kyle McMartin
On Thu, Mar 09, 2017 at 12:00:49PM +0200, Yuval Mintz wrote:
> This new firmware for the qed* adapters fixes multiple issues:
>  - Prevent accidental drops in Tx-switching between VFs.
>  - Corrct VF cleanup for higher VFs.
>  - Better handling of RoCE error flows.
>  - MR registration/deregistration corrections.
>  - Prevent possible HW attention during offloaded TCP teardown.
>  - Corrections to iSCSI retransmit flows.
> 
> In addition, this firmware is a requirement for supporting the
> QL41xxx series of adapters, as it contains the necessary init
> and firmware logic for those.
> 
> Signed-off-by: Yuval Mintz 

applied, thanks Yuval.

regards, --kyle


RE: [PATCH] qla2xxx: Fix crash in qla2xxx_eh_abort on bad ptr

2017-03-09 Thread Madhani, Himanshu


> -Original Message-
> From: Bill Kuzeja [mailto:william.kuz...@stratus.com]
> Sent: Thursday, March 09, 2017 8:47 AM
> To: linux-scsi@vger.kernel.org
> Cc: qla2xxx-upstr...@qlogic.com; Bill Kuzeja 
> Subject: [PATCH] qla2xxx: Fix crash in qla2xxx_eh_abort on bad ptr
> 
> I've seen this issue only after a Qlogic card breaks upon initialization (one 
> of
> my test cases). After the break, qla2x00_abort_all_cmds gets invoked. This
> routine has a relatively new section introduced by these
> commits:
> 
> commit 1535aa75a3d8 ("scsi: qla2xxx: fix invalid DMA access after command
> aborts in PCI device remove") commit c733ab351243 ("scsi: qla2xxx: do not
> abort all commands in the adapter during EEH recovery") commit
> 2780f3c8f0233 ("scsi: qla2xxx: Avoid that issuing a LIP triggers a kernel 
> crash")
> 
> The section is problematic in certain cases. Here's the if statement in
> question:
> 
> if (GET_CMD_SP(sp) && !ha->flags.eeh_busy) {
>/* Get a reference to the sp and drop the lock.
> * The reference ensures this sp->done() call
> * - and not the call in qla2xxx_eh_abort() -
> * ends the SCSI command (with result 'res').
> */
> sp_get(sp);
> spin_unlock_irqrestore(>hardware_lock,flags);
> qla2xxx_eh_abort(GET_CMD_SP(sp));
> spin_lock_irqsave(>hardware_lock, flags);
> }
> 
> Now here's the panic my test provokes:
> 
> [  927.823858] Call Trace:
> [  927.581661]  qla2xxx_eh_abort+0x19/0x2b0 [qla2xxx] [  927.829269]
> [] qla2x00_abort_all_cmds+0xf6/0x14 [qla2xxx] [
> 927.845014]  []
> qla2x00_disable_board_on_pci_error+0x8f/0x160 [qla2xxx] [  927.863054]
> [] process_one_work+0x17b/0x470 [  927.875916]
> [] worker_thread+0x126/0x410 [  927.888203]
> [] ? rescuer_thread+0x460/0x460 [  927.901067]
> [] kthread+0xcf/0xe0 [  927.911823]  []
> ?kthread_create_on_node+0x140/0x140
> [  927.926224]  [] ret_from_fork+0x58/0x90 [  927.938132]
> [] ?kthread_create_on_node+0x140/0x140
> 
> We crash in qla2xxx_eh_abort trying to get the vha:
> 
>   scsi_qla_host_t *vha = shost_priv(cmd->device->host);
> 
> Closer examination of the crash shows that the value of "cmd" is 2, certainly
> not a valid pointer.
> 
> What's happening here?
> 
> The check at the top of the if-block GET_CMD_SP(sp) implicitly != NULL
> would have prevented this sort of thing. However, since sp->u.scmd.cmd is
> not *quite* null (in my crashes it's usually 2), we fall into the if-block 
> and call
> qla2xxx_eh_abort - and crash trying to get cmd.
> 
> Note that the GET_CMD_SP(sp) is doing this:
> 
> #define GET_CMD_SP(sp) (sp->u.scmd.cmd)
> 
> and is acting upon a union:
> 
> union {
> struct srb_iocb iocb_cmd;
> struct fc_bsg_job *bsg_job;
> struct srb_cmd scmd;
>   }
> 
> The address it's getting is the first element in this structure:
> 
> struct srb_cmd {
>   struct scsi_cmnd *cmd;  /* Linux SCSI command pkt */
> 
> }
> 
> However, since this is a union, the same memory can be used this way
> instead:
> 
> struct srb_iocb {
>   union {
>   struct {
>   uint16_t flags;
>   uint16_t data[2];
>   u32 iop[2];
>   } logio;
>   
> 
> Turns out, in the kernel panics I have gotten, the iocb type is logio 
> (verified
> by srb->type = SRB_LOGIN_CMD).
> 
> To further check, I looked at the logio iocb in the crash:
> 
> logio = {
>   flags = 0x2,
>   data = {0x0, 0x0}
> 
> 
> which follows since:
> 
>lio->u.logio.flags |= SRB_LOGIN_COND_PLOGI;
> 
> and
> 
> #define SRB_LOGIN_COND_PLOGI  BIT_1
> 
> In order to eliminate this crash, this patch adds an extra check to the top of
> the if statement to check whether or not sp->type == SRB_SCSI_CMD.
> If not, then don't bother doing the rest of the if-block. It doesn't look 
> like we
> should be invoking qla2xxx_eh_abort for anything other than srb_cmds
> anyways.
> 
> I thought about changing the definition of GET_CMD_SP to include this type
> check and return NULL if sp is not type SRB_SCSI_CMD - like this:
> 
> #define GET_CMD_SP(sp) ((sp->type == SRB_SCSI_CMD) ? sp->u.scmd.cmd
> : NULL)
> 
> I decided against it as there are multiple places in the code that do not 
> check
> for NULL. If you're calling GET_CMD_SP you should be dealing with an
> SRB_SCSI_CMDbut we aren't in this case. So, for this patch I went with the
> more contained and safer change.
> 
> This problem is hard to hit, but I have run into it doing negative testing 
> many
> times now (with a test specifically designed to bring it out), so I can verify
> that this fix works. My testing has been against a RHEL7 driver variant, but
> the bug and patch are equally relevant to to the upstream driver.
> 
> Fixes: 1535aa75a3d8 ("scsi: qla2xxx: fix invalid DMA access after command
> aborts in PCI device remove")

[PATCH] qla2xxx: Fix crash in qla2xxx_eh_abort on bad ptr

2017-03-09 Thread Bill Kuzeja
I've seen this issue only after a Qlogic card breaks upon initialization 
(one of my test cases). After the break, qla2x00_abort_all_cmds gets 
invoked. This routine has a relatively new section introduced by these 
commits:

commit 1535aa75a3d8 ("scsi: qla2xxx: fix invalid DMA access after command 
aborts in PCI device remove")
commit c733ab351243 ("scsi: qla2xxx: do not abort all commands in the adapter 
during EEH recovery")
commit 2780f3c8f0233 ("scsi: qla2xxx: Avoid that issuing a LIP triggers a 
kernel crash") 

The section is problematic in certain cases. Here's the if statement in 
question:

if (GET_CMD_SP(sp) && !ha->flags.eeh_busy) {
   /* Get a reference to the sp and drop the lock.
* The reference ensures this sp->done() call
* - and not the call in qla2xxx_eh_abort() -
* ends the SCSI command (with result 'res').
*/
sp_get(sp);
spin_unlock_irqrestore(>hardware_lock,flags);
qla2xxx_eh_abort(GET_CMD_SP(sp));
spin_lock_irqsave(>hardware_lock, flags);
}

Now here's the panic my test provokes:

[  927.823858] Call Trace:
[  927.581661]  qla2xxx_eh_abort+0x19/0x2b0 [qla2xxx]
[  927.829269]  [] qla2x00_abort_all_cmds+0xf6/0x14 [qla2xxx]
[  927.845014]  [] 
qla2x00_disable_board_on_pci_error+0x8f/0x160 [qla2xxx]
[  927.863054]  [] process_one_work+0x17b/0x470
[  927.875916]  [] worker_thread+0x126/0x410
[  927.888203]  [] ? rescuer_thread+0x460/0x460
[  927.901067]  [] kthread+0xcf/0xe0
[  927.911823]  [] ?kthread_create_on_node+0x140/0x140
[  927.926224]  [] ret_from_fork+0x58/0x90
[  927.938132]  [] ?kthread_create_on_node+0x140/0x140

We crash in qla2xxx_eh_abort trying to get the vha:

scsi_qla_host_t *vha = shost_priv(cmd->device->host);

Closer examination of the crash shows that the value of "cmd" is 2,
certainly not a valid pointer.

What's happening here?

The check at the top of the if-block GET_CMD_SP(sp) implicitly != NULL 
would have prevented this sort of thing. However, since sp->u.scmd.cmd 
is not *quite* null (in my crashes it's usually 2), we fall into the 
if-block and call qla2xxx_eh_abort - and crash trying to get cmd.

Note that the GET_CMD_SP(sp) is doing this:

#define GET_CMD_SP(sp) (sp->u.scmd.cmd)

and is acting upon a union:

union {
struct srb_iocb iocb_cmd;
struct fc_bsg_job *bsg_job;
struct srb_cmd scmd;
}

The address it's getting is the first element in this structure:

struct srb_cmd {
struct scsi_cmnd *cmd;  /* Linux SCSI command pkt */

}

However, since this is a union, the same memory can be used this way 
instead:

struct srb_iocb {
union {
struct {
uint16_t flags;
uint16_t data[2];
u32 iop[2];
} logio;


Turns out, in the kernel panics I have gotten, the iocb type is logio
(verified by srb->type = SRB_LOGIN_CMD). 

To further check, I looked at the logio iocb in the crash:

logio = {
  flags = 0x2,
  data = {0x0, 0x0}


which follows since:

   lio->u.logio.flags |= SRB_LOGIN_COND_PLOGI;

and

#define SRB_LOGIN_COND_PLOGIBIT_1

In order to eliminate this crash, this patch adds an extra check to the
top of the if statement to check whether or not sp->type == SRB_SCSI_CMD. 
If not, then don't bother doing the rest of the if-block. It doesn't look
like we should be invoking qla2xxx_eh_abort for anything other than
srb_cmds anyways.

I thought about changing the definition of GET_CMD_SP to include this
type check and return NULL if sp is not type SRB_SCSI_CMD - like this:

#define GET_CMD_SP(sp) ((sp->type == SRB_SCSI_CMD) ? sp->u.scmd.cmd : NULL)

I decided against it as there are multiple places in the code that do not
check for NULL. If you're calling GET_CMD_SP you should be dealing with 
an SRB_SCSI_CMDbut we aren't in this case. So, for this patch I went 
with the more contained and safer change.

This problem is hard to hit, but I have run into it doing negative
testing many times now (with a test specifically designed to bring it
out), so I can verify that this fix works. My testing has been against
a RHEL7 driver variant, but the bug and patch are equally relevant to
to the upstream driver.

Fixes: 1535aa75a3d8 ("scsi: qla2xxx: fix invalid DMA access after command 
aborts in PCI device remove")
Signed-off-by: Bill Kuzeja 
---
 drivers/scsi/qla2xxx/qla_os.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index d01c90c..4eec095 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -1617,7 +1617,8 @@ uint32_t qla2x00_isp_reg_stat(struct qla_hw_data *ha)
/* Don't abort commands in adapter during EEH
 * recovery as it's not 

Re: [PATCH 0/4 v2] block: Fixes for bdi handling

2017-03-09 Thread Jens Axboe
On 03/09/2017 03:16 AM, Jan Kara wrote:
> Hi!
> 
> this is a second revision of the series fixing the most urgent bugs that were
> introduced by commit 165a5e22fafb "block: Move bdi_unregister() to
> del_gendisk()" and by 0dba1314d4f8 "scsi, block: fix duplicate bdi name
> registration crashes".  In fact before these commits we had a different set of
> problems in the code but they were less visible :).

It was rather urgent to get those fixes in, so I already sent them off. Not a
huge deal, but it would be nice to add the atomic init fix as a separate patch
later on.

-- 
Jens Axboe



Re: [PATCH 1/4] block: Allow bdi re-registration

2017-03-09 Thread Tejun Heo
On Thu, Mar 09, 2017 at 11:16:21AM +0100, Jan Kara wrote:
> SCSI can call device_add_disk() several times for one request queue when
> a device in unbound and bound, creating new gendisk each time. This will
> lead to bdi being repeatedly registered and unregistered. This was not a
> big problem until commit 165a5e22fafb "block: Move bdi_unregister() to
> del_gendisk()" since bdi was only registered repeatedly (bdi_register()
> handles repeated calls fine, only we ended up leaking reference to
> gendisk due to overwriting bdi->owner) but unregistered only in
> blk_cleanup_queue() which didn't get called repeatedly. After
> 165a5e22fafb we were doing correct bdi_register() - bdi_unregister()
> cycles however bdi_unregister() is not prepared for it. So make sure
> bdi_unregister() cleans up bdi in such a way that it is prepared for
> a possible following bdi_register() call.
> 
> An easy way to provoke this behavior is to enable
> CONFIG_DEBUG_TEST_DRIVER_REMOVE and use scsi_debug driver to create a
> scsi disk which immediately hangs without this fix.
> 
> Fixes: 165a5e22fafb127ecb5914e12e8c32a1f0d3f820
> Tested-by: Omar Sandoval 
> Signed-off-by: Jan Kara 

Acked-by: Tejun Heo 

Thanks!

-- 
tejun


[PATCH v2] scsi_sysfs: fix hang when removing scsi device

2017-03-09 Thread Israel Rukshin
The bug reproduce when unloading srp module with one port down.
sd_shutdown() hangs when __scsi_remove_device() get scsi_device with
state SDEV_OFFLINE or SDEV_TRANSPORT_OFFLINE.
It hangs because sd_shutdown() is trying to send sync cache command
when the device is offline but with SDEV_CANCEL status.
The status was changed to SDEV_CANCEL by __scsi_remove_device()
before it calls to device_del().

The block layer timeout mechanism doesn't cause the SYNCHRONIZE CACHE
command to fail after the timeout expired because the request timer
wasn't started.
blk_peek_request() that is called from scsi_request_fn() didn't return
this request and therefore the request timer didn't start.

This commit doesn't accept new commands if the original state was offline.

The bug was revealed after commit cff549 ("scsi: proper state checking
and module refcount handling in scsi_device_get").
After this commit scsi_device_get() returns error if the device state
is SDEV_CANCEL.
This eventually leads SRP fast I/O failure timeout handler not to clean
the sync cache command because scsi_target_unblock() skip the canceled device.
If this timeout handler is set to infinity then the hang remains forever
also before commit cff549.

sysrq: SysRq : sysrq: Show Blocked State
task PC stack pid father
kworker/2:0 D 88046fa95c00 0 21178 2 0x
Workqueue: srp_remove srp_remove_work [ib_srp]
Call Trace:
[] schedule+0x35/0x80
[] schedule_timeout+0x237/0x2d0
[] io_schedule_timeout+0xa6/0x110
[] wait_for_completion_io+0xa3/0x110
[] blk_execute_rq+0xdf/0x120
[] scsi_execute+0xce/0x150 [scsi_mod]
[] scsi_execute_req_flags+0x8f/0xf0 [scsi_mod]
[] sd_sync_cache+0xa9/0x190 [sd_mod]
[] sd_shutdown+0x6a/0x100 [sd_mod]
[] sd_remove+0x64/0xc0 [sd_mod]
[] __device_release_driver+0x8d/0x120
[] device_release_driver+0x1e/0x30
[] bus_remove_device+0xf9/0x170
[] device_del+0x127/0x240
[] __scsi_remove_device+0xc1/0xd0 [scsi_mod]
[] scsi_forget_host+0x57/0x60 [scsi_mod]
[] scsi_remove_host+0x72/0x110 [scsi_mod]
[] srp_remove_work+0x8b/0x200 [ib_srp]
...

Signed-off-by: Israel Rukshin 
Reviewed-by: Max Gurtovoy 
---

Changes from v1:
 - add extra description to commit message and to the comment.
 - refer to the commit that originally introduced this hang.

---
 drivers/scsi/scsi_sysfs.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 82dfe07..8a977f5 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1282,6 +1282,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
return;
 
if (sdev->is_visible) {
+   enum scsi_device_state oldstate = sdev->sdev_state;
+
if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
return;
 
@@ -1289,6 +1291,17 @@ void __scsi_remove_device(struct scsi_device *sdev)
device_unregister(>sdev_dev);
transport_remove_device(dev);
scsi_dh_remove_device(sdev);
+
+   /*
+* Fail new requests if the old state was offline.
+* This avoids from sd_shutdown() to hang.
+* The SYNCHRONIZE CACHE request timer will never start
+* in that case.
+*/
+   if (oldstate == SDEV_TRANSPORT_OFFLINE ||
+   oldstate == SDEV_OFFLINE)
+   blk_set_queue_dying(sdev->request_queue);
+
device_del(dev);
} else
put_device(>sdev_dev);
-- 
2.4.3



Re: [PATCH] target/user: Fix possible overwrite of t_data_sg'slastiov[]

2017-03-09 Thread Ilias Tsitsimpis
On Thu, Mar 09, 2017 at 10:08PM, 李秀波 wrote:
> Any other advice about this change for your case?

I took a look at the patch you sent and it seems reasonable to me.
I didn't have the time to test it, but I will try to update to the
latest version (probably by next week) and report back.

Thanks for your work :)

-- 
Ilias


Re: [PATCH] target/user: Fix possible overwrite of t_data_sg's lastiov[]

2017-03-09 Thread Ilias Tsitsimpis
Hi Andy, Xiubo,

On Fri, Mar 03, 2017 at 11:00AM, Andy Grover wrote:
> On 02/27/2017 09:47 PM, lixi...@cmss.chinamobile.com wrote:
> > From: Xiubo Li 
> > 
> > If there has BIDI data, its first iov[] will overwrite the last
> > iov[] for se_cmd->t_data_sg.
> 
> (+CCing orig BIDI and data block code authors)
> 
> Yeah. It looks like because alloc_and_scatter_data_area() (hereafter
> "aasda") is called twice in the BIDI case, and both times iov_cnt is 0, the
> new_iov() call doesn't increment the iov ptr and the first bidi iov
> overwrites the last data iov. Maybe fix this by exiting aasda() with iov
> pointing at the next unused iov in the array? Probably also want to zero the
> iov.

Yes this is definitely a bug, thanks for catching this. This seems to
have been introduced around the time new_iov() was introduced, and hence
it has been broken since v4.6.

On Mon, Mar 06, 2017 at 10:09AM, Andy Grover wrote:
> On 03/05/2017 09:48 PM, Xiubo Li wrote:
> > For the BIDI data, still hasn't been used by the tcmu-runner. Is any
> > other consumer using this?
> 
> Well with kernel APIs you just never know if somebody's using it but just
> not saying anything. But given this bug, how could they?

I added support for BIDI commands initially, as I needed to have SCSI
OSD working over tcmu and the OSD protocol requires BIDI support and I
am definitely using it now. Unfortunately, I haven't been able to
closely follow the development of target/user lately, so I failed to
notice this breakage in time. Sorry about that.

-- 
Ilias


[PATCH] drivers, scsi: convert iscsi_task.refcount from atomic_t to refcount_t

2017-03-09 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
Acked-by: Chris Leech 
---
 drivers/scsi/libiscsi.c| 8 
 drivers/scsi/qedi/qedi_iscsi.c | 2 +-
 include/scsi/libiscsi.h| 3 ++-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 834d121..7eb1d2c 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -516,13 +516,13 @@ static void iscsi_free_task(struct iscsi_task *task)
 
 void __iscsi_get_task(struct iscsi_task *task)
 {
-   atomic_inc(>refcount);
+   refcount_inc(>refcount);
 }
 EXPORT_SYMBOL_GPL(__iscsi_get_task);
 
 void __iscsi_put_task(struct iscsi_task *task)
 {
-   if (atomic_dec_and_test(>refcount))
+   if (refcount_dec_and_test(>refcount))
iscsi_free_task(task);
 }
 EXPORT_SYMBOL_GPL(__iscsi_put_task);
@@ -744,7 +744,7 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct 
iscsi_hdr *hdr,
 * released by the lld when it has transmitted the task for
 * pdus we do not expect a response for.
 */
-   atomic_set(>refcount, 1);
+   refcount_set(>refcount, 1);
task->conn = conn;
task->sc = NULL;
INIT_LIST_HEAD(>running);
@@ -1616,7 +1616,7 @@ static inline struct iscsi_task *iscsi_alloc_task(struct 
iscsi_conn *conn,
sc->SCp.phase = conn->session->age;
sc->SCp.ptr = (char *) task;
 
-   atomic_set(>refcount, 1);
+   refcount_set(>refcount, 1);
task->state = ISCSI_TASK_PENDING;
task->conn = conn;
task->sc = sc;
diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
index b9f79d3..3895bd5 100644
--- a/drivers/scsi/qedi/qedi_iscsi.c
+++ b/drivers/scsi/qedi/qedi_iscsi.c
@@ -1372,7 +1372,7 @@ static void qedi_cleanup_task(struct iscsi_task *task)
 {
if (!task->sc || task->state == ISCSI_TASK_PENDING) {
QEDI_INFO(NULL, QEDI_LOG_IO, "Returning ref_cnt=%d\n",
- atomic_read(>refcount));
+ refcount_read(>refcount));
return;
}
 
diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h
index b0e275d..24d74b5 100644
--- a/include/scsi/libiscsi.h
+++ b/include/scsi/libiscsi.h
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -139,7 +140,7 @@ struct iscsi_task {
 
/* state set/tested under session->lock */
int state;
-   atomic_trefcount;
+   refcount_t  refcount;
struct list_headrunning;/* running cmd list */
void*dd_data;   /* driver/transport data */
 };
-- 
2.7.4



[PATCH 4/4] Revert "scsi, block: fix duplicate bdi name registration crashes"

2017-03-09 Thread Jan Kara
This reverts commit 0dba1314d4f81115dce711292ec7981d17231064. It causes
leaking of device numbers for SCSI when SCSI registers multiple gendisks
for one request_queue in succession. It can be easily reproduced using
Omar's script [1] on kernel with CONFIG_DEBUG_TEST_DRIVER_REMOVE.
Furthermore the protection provided by this commit is not needed anymore
as the problem it was fixing got also fixed by commit 165a5e22fafb
"block: Move bdi_unregister() to del_gendisk()".

[1]: http://marc.info/?l=linux-block=148554717109098=2

Tested-by: Omar Sandoval 
Acked-by: Dan Williams 
Signed-off-by: Jan Kara 
---
 block/blk-core.c   |  2 --
 block/genhd.c  | 21 -
 drivers/scsi/sd.c  | 41 -
 include/linux/blkdev.h |  1 -
 include/linux/genhd.h  |  8 
 5 files changed, 8 insertions(+), 65 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 1086dac8724c..a76895c9776d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -578,8 +578,6 @@ void blk_cleanup_queue(struct request_queue *q)
q->queue_lock = >__queue_lock;
spin_unlock_irq(lock);
 
-   put_disk_devt(q->disk_devt);
-
/* @q is and will stay empty, shutdown and put */
blk_put_queue(q);
 }
diff --git a/block/genhd.c b/block/genhd.c
index 94f323842b52..a9c516a8b37d 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -572,20 +572,6 @@ static void register_disk(struct device *parent, struct 
gendisk *disk)
disk_part_iter_exit();
 }
 
-void put_disk_devt(struct disk_devt *disk_devt)
-{
-   if (disk_devt && atomic_dec_and_test(_devt->count))
-   disk_devt->release(disk_devt);
-}
-EXPORT_SYMBOL(put_disk_devt);
-
-void get_disk_devt(struct disk_devt *disk_devt)
-{
-   if (disk_devt)
-   atomic_inc(_devt->count);
-}
-EXPORT_SYMBOL(get_disk_devt);
-
 /**
  * device_add_disk - add partitioning information to kernel list
  * @parent: parent device for the disk
@@ -626,13 +612,6 @@ void device_add_disk(struct device *parent, struct gendisk 
*disk)
 
disk_alloc_events(disk);
 
-   /*
-* Take a reference on the devt and assign it to queue since it
-* must not be reallocated while the bdi is registered
-*/
-   disk->queue->disk_devt = disk->disk_devt;
-   get_disk_devt(disk->disk_devt);
-
/* Register BDI before referencing it from bdev */
bdi = disk->queue->backing_dev_info;
bdi_register_owner(bdi, disk_to_dev(disk));
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index c7839f6c35cc..d277e8620e3e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3075,23 +3075,6 @@ static void sd_probe_async(void *data, async_cookie_t 
cookie)
put_device(>dev);
 }
 
-struct sd_devt {
-   int idx;
-   struct disk_devt disk_devt;
-};
-
-static void sd_devt_release(struct disk_devt *disk_devt)
-{
-   struct sd_devt *sd_devt = container_of(disk_devt, struct sd_devt,
-   disk_devt);
-
-   spin_lock(_index_lock);
-   ida_remove(_index_ida, sd_devt->idx);
-   spin_unlock(_index_lock);
-
-   kfree(sd_devt);
-}
-
 /**
  * sd_probe - called during driver initialization and whenever a
  * new scsi device is attached to the system. It is called once
@@ -3113,7 +3096,6 @@ static void sd_devt_release(struct disk_devt *disk_devt)
 static int sd_probe(struct device *dev)
 {
struct scsi_device *sdp = to_scsi_device(dev);
-   struct sd_devt *sd_devt;
struct scsi_disk *sdkp;
struct gendisk *gd;
int index;
@@ -3139,13 +3121,9 @@ static int sd_probe(struct device *dev)
if (!sdkp)
goto out;
 
-   sd_devt = kzalloc(sizeof(*sd_devt), GFP_KERNEL);
-   if (!sd_devt)
-   goto out_free;
-
gd = alloc_disk(SD_MINORS);
if (!gd)
-   goto out_free_devt;
+   goto out_free;
 
do {
if (!ida_pre_get(_index_ida, GFP_KERNEL))
@@ -3161,11 +3139,6 @@ static int sd_probe(struct device *dev)
goto out_put;
}
 
-   atomic_set(_devt->disk_devt.count, 1);
-   sd_devt->disk_devt.release = sd_devt_release;
-   sd_devt->idx = index;
-   gd->disk_devt = _devt->disk_devt;
-
error = sd_format_disk_name("sd", index, gd->disk_name, DISK_NAME_LEN);
if (error) {
sdev_printk(KERN_WARNING, sdp, "SCSI disk (sd) name length 
exceeded.\n");
@@ -3205,12 +3178,11 @@ static int sd_probe(struct device *dev)
return 0;
 
  out_free_index:
-   put_disk_devt(_devt->disk_devt);
-   sd_devt = NULL;
+   spin_lock(_index_lock);
+   ida_remove(_index_ida, index);
+   spin_unlock(_index_lock);
  out_put:
put_disk(gd);
- out_free_devt:
-   kfree(sd_devt);
  out_free:
kfree(sdkp);
  out:
@@ -3271,7 +3243,10 @@ static void 

[PATCH 1/4] block: Allow bdi re-registration

2017-03-09 Thread Jan Kara
SCSI can call device_add_disk() several times for one request queue when
a device in unbound and bound, creating new gendisk each time. This will
lead to bdi being repeatedly registered and unregistered. This was not a
big problem until commit 165a5e22fafb "block: Move bdi_unregister() to
del_gendisk()" since bdi was only registered repeatedly (bdi_register()
handles repeated calls fine, only we ended up leaking reference to
gendisk due to overwriting bdi->owner) but unregistered only in
blk_cleanup_queue() which didn't get called repeatedly. After
165a5e22fafb we were doing correct bdi_register() - bdi_unregister()
cycles however bdi_unregister() is not prepared for it. So make sure
bdi_unregister() cleans up bdi in such a way that it is prepared for
a possible following bdi_register() call.

An easy way to provoke this behavior is to enable
CONFIG_DEBUG_TEST_DRIVER_REMOVE and use scsi_debug driver to create a
scsi disk which immediately hangs without this fix.

Fixes: 165a5e22fafb127ecb5914e12e8c32a1f0d3f820
Tested-by: Omar Sandoval 
Signed-off-by: Jan Kara 
---
 mm/backing-dev.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 6d861d090e9f..51325489aae5 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -710,6 +710,11 @@ static void cgwb_bdi_destroy(struct backing_dev_info *bdi)
 */
atomic_dec(>usage_cnt);
wait_event(cgwb_release_wait, !atomic_read(>usage_cnt));
+   /*
+* Reinitialize usage_cnt so that we hold reference when @bdi gets
+* re-registered.
+*/
+   atomic_set(>usage_cnt, 1);
 }
 
 /**
@@ -857,6 +862,8 @@ int bdi_register_owner(struct backing_dev_info *bdi, struct 
device *owner)
MINOR(owner->devt));
if (rc)
return rc;
+   /* Leaking owner reference... */
+   WARN_ON(bdi->owner);
bdi->owner = owner;
get_device(owner);
return 0;
-- 
2.10.2



[PATCH 0/4 v2] block: Fixes for bdi handling

2017-03-09 Thread Jan Kara
Hi!

this is a second revision of the series fixing the most urgent bugs that were
introduced by commit 165a5e22fafb "block: Move bdi_unregister() to
del_gendisk()" and by 0dba1314d4f8 "scsi, block: fix duplicate bdi name
registration crashes".  In fact before these commits we had a different set of
problems in the code but they were less visible :).

Changes since v1:
* Added Acked-by and Tested-by tags
* Changed usage_cnt increment to reinitialize as requested by Tejun and James

Honza


[PATCH 2/4] bdi: Fix use-after-free in wb_congested_put()

2017-03-09 Thread Jan Kara
bdi_writeback_congested structures get created for each blkcg and bdi
regardless whether bdi is registered or not. When they are created in
unregistered bdi and the request queue (and thus bdi) is then destroyed
while blkg still holds reference to bdi_writeback_congested structure,
this structure will be referencing freed bdi and last wb_congested_put()
will try to remove the structure from already freed bdi.

With commit 165a5e22fafb "block: Move bdi_unregister() to
del_gendisk()", SCSI started to destroy bdis without calling
bdi_unregister() first (previously it was calling bdi_unregister() even
for unregistered bdis) and thus the code detaching
bdi_writeback_congested in cgwb_bdi_destroy() was not triggered and we
started hitting this use-after-free bug. It is enough to boot a KVM
instance with virtio-scsi device to trigger this behavior.

Fix the problem by detaching bdi_writeback_congested structures in
bdi_exit() instead of bdi_unregister(). This is also more logical as
they can get attached to bdi regardless whether it ever got registered
or not.

Fixes: 165a5e22fafb127ecb5914e12e8c32a1f0d3f820
Tested-by: Omar Sandoval 
Acked-by: Tejun Heo 
Signed-off-by: Jan Kara 
---
 mm/backing-dev.c | 36 +---
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 51325489aae5..b05ace3ba178 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -683,30 +683,18 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi)
 static void cgwb_bdi_destroy(struct backing_dev_info *bdi)
 {
struct radix_tree_iter iter;
-   struct rb_node *rbn;
void **slot;
 
WARN_ON(test_bit(WB_registered, >wb.state));
 
spin_lock_irq(_lock);
-
radix_tree_for_each_slot(slot, >cgwb_tree, , 0)
cgwb_kill(*slot);
-
-   while ((rbn = rb_first(>cgwb_congested_tree))) {
-   struct bdi_writeback_congested *congested =
-   rb_entry(rbn, struct bdi_writeback_congested, rb_node);
-
-   rb_erase(rbn, >cgwb_congested_tree);
-   congested->bdi = NULL;  /* mark @congested unlinked */
-   }
-
spin_unlock_irq(_lock);
 
/*
-* All cgwb's and their congested states must be shutdown and
-* released before returning.  Drain the usage counter to wait for
-* all cgwb's and cgwb_congested's ever created on @bdi.
+* All cgwb's must be shutdown and released before returning.  Drain
+* the usage counter to wait for all cgwb's ever created on @bdi.
 */
atomic_dec(>usage_cnt);
wait_event(cgwb_release_wait, !atomic_read(>usage_cnt));
@@ -754,6 +742,21 @@ void wb_blkcg_offline(struct blkcg *blkcg)
spin_unlock_irq(_lock);
 }
 
+static void cgwb_bdi_exit(struct backing_dev_info *bdi)
+{
+   struct rb_node *rbn;
+
+   spin_lock_irq(_lock);
+   while ((rbn = rb_first(>cgwb_congested_tree))) {
+   struct bdi_writeback_congested *congested =
+   rb_entry(rbn, struct bdi_writeback_congested, rb_node);
+
+   rb_erase(rbn, >cgwb_congested_tree);
+   congested->bdi = NULL;  /* mark @congested unlinked */
+   }
+   spin_unlock_irq(_lock);
+}
+
 #else  /* CONFIG_CGROUP_WRITEBACK */
 
 static int cgwb_bdi_init(struct backing_dev_info *bdi)
@@ -774,7 +777,9 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi)
return 0;
 }
 
-static void cgwb_bdi_destroy(struct backing_dev_info *bdi)
+static void cgwb_bdi_destroy(struct backing_dev_info *bdi) { }
+
+static void cgwb_bdi_exit(struct backing_dev_info *bdi)
 {
wb_congested_put(bdi->wb_congested);
 }
@@ -905,6 +910,7 @@ static void bdi_exit(struct backing_dev_info *bdi)
 {
WARN_ON_ONCE(bdi->dev);
wb_exit(>wb);
+   cgwb_bdi_exit(bdi);
 }
 
 static void release_bdi(struct kref *ref)
-- 
2.10.2



[PATCH 3/4] block: Make del_gendisk() safer for disks without queues

2017-03-09 Thread Jan Kara
Commit 165a5e22fafb "block: Move bdi_unregister() to del_gendisk()"
added disk->queue dereference to del_gendisk(). Although del_gendisk()
is not supposed to be called without disk->queue valid and
blk_unregister_queue() warns in that case, this change will make it oops
instead. Return to the old more robust behavior of just warning when
del_gendisk() gets called for gendisk with disk->queue being NULL.

Reported-by: Dan Carpenter 
Tested-by: Omar Sandoval 
Acked-by: Tejun Heo 
Signed-off-by: Jan Kara 
---
 block/genhd.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index b26a5ea115d0..94f323842b52 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -681,12 +681,16 @@ void del_gendisk(struct gendisk *disk)
disk->flags &= ~GENHD_FL_UP;
 
sysfs_remove_link(_to_dev(disk)->kobj, "bdi");
-   /*
-* Unregister bdi before releasing device numbers (as they can get
-* reused and we'd get clashes in sysfs).
-*/
-   bdi_unregister(disk->queue->backing_dev_info);
-   blk_unregister_queue(disk);
+   if (disk->queue) {
+   /*
+* Unregister bdi before releasing device numbers (as they can
+* get reused and we'd get clashes in sysfs).
+*/
+   bdi_unregister(disk->queue->backing_dev_info);
+   blk_unregister_queue(disk);
+   } else {
+   WARN_ON(1);
+   }
blk_unregister_region(disk_devt(disk), disk->minors);
 
part_stat_set_all(>part0, 0);
-- 
2.10.2



Re: [PATCH v5 04/19] net: e100: Replace PCI pool old API

2017-03-09 Thread Romain Perier
Hello,


Le 09/03/2017 à 08:01, Peter Senna Tschudin a écrit :
> On Wed, Mar 08, 2017 at 02:40:25PM -0800, Jeff Kirsher wrote:
>> On Wed, 2017-03-08 at 17:19 +0100, Romain Perier wrote:
>>> The PCI pool API is deprecated. This commit replaces the PCI pool old
>>> API by the appropriate function with the DMA pool API.
>>>
>>> Signed-off-by: Romain Perier 
>>> Acked-by: Peter Senna Tschudin 
>>> Tested-by: Peter Senna Tschudin 
>>> ---
>>>  drivers/net/ethernet/intel/e100.c | 12 ++--
>>>  1 file changed, 6 insertions(+), 6 deletions(-)
>> Acked-by: Jeff Kirsher 
>>
>> My only concern is:
>> - what hardware did this get tested with?  Since this affects all e100
>> parts, it would be hard to believe that all the affected hardware was
>> used in testing.
> This was tested by compilation only(See
> https://lkml.org/lkml/2017/2/8/661). However this series removes macro
> definitions of the old pci_pool interface and replace call sites by what
> the macra was calling.
>
> Here are the macros that this series removes from include/pci.h:
>
> #define pci_pool dma_pool
> #define pci_pool_create(name, pdev, size, align, allocation) \
>   dma_pool_create(name, >dev, size, align, allocation)
> #define pci_pool_destroy(pool) dma_pool_destroy(pool)
> #define pci_pool_alloc(pool, flags, handle) dma_pool_alloc(pool, flags, 
> handle)
> #define pci_pool_zalloc(pool, flags, handle) \
>   dma_pool_zalloc(pool, flags, handle)
> #define pci_pool_free(pool, vaddr, addr) dma_pool_free(pool, vaddr, add
>
> So this should not affect run time.
We cannot test a patch like this one on all affected platforms/drivers
(at runtime). Simply because we have not the hw. As Peter said, we
tested this by compilation only for now via make allyesconfig. That's up
to the maintainer of the subsystem to test and ack this, imho.
I agree with Peter, this should not affect runtime (as semantically it's
compatible and have been validated statically and semantically by your
compiler)

Regards,
Romain


Re: [PATCH 22/29] drivers, scsi: convert iscsi_task.refcount from atomic_t to refcount_t

2017-03-09 Thread Johannes Thumshirn
On 03/09/2017 10:26 AM, Reshetova, Elena wrote:
> 
>> On 03/09/2017 08:18 AM, Reshetova, Elena wrote:
 On Mon, Mar 06, 2017 at 04:21:09PM +0200, Elena Reshetova wrote:
> refcount_t type and corresponding API should be
> used instead of atomic_t when the variable is used as
> a reference counter. This allows to avoid accidental
> refcounter overflows that might lead to use-after-free
> situations.
>
> Signed-off-by: Elena Reshetova 
> Signed-off-by: Hans Liljestrand 
> Signed-off-by: Kees Cook 
> Signed-off-by: David Windsor 

 This looks OK to me.

 Acked-by: Chris Leech 
>>>
>>> Thank you for review! Do you have a tree that can take this change?
>>
>> Hi Elena,
>>
>> iscsi like fcoe should go via the SCSI tree.
> 
> Thanks Johannes! Should I resend with "Acked-by" added in order for it to be 
> picked up? 

Yes I think this would be a good way to go.

Byte,
Johannes
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


RE: [PATCH 22/29] drivers, scsi: convert iscsi_task.refcount from atomic_t to refcount_t

2017-03-09 Thread Reshetova, Elena

> On 03/09/2017 08:18 AM, Reshetova, Elena wrote:
> >> On Mon, Mar 06, 2017 at 04:21:09PM +0200, Elena Reshetova wrote:
> >>> refcount_t type and corresponding API should be
> >>> used instead of atomic_t when the variable is used as
> >>> a reference counter. This allows to avoid accidental
> >>> refcounter overflows that might lead to use-after-free
> >>> situations.
> >>>
> >>> Signed-off-by: Elena Reshetova 
> >>> Signed-off-by: Hans Liljestrand 
> >>> Signed-off-by: Kees Cook 
> >>> Signed-off-by: David Windsor 
> >>
> >> This looks OK to me.
> >>
> >> Acked-by: Chris Leech 
> >
> > Thank you for review! Do you have a tree that can take this change?
> 
> Hi Elena,
> 
> iscsi like fcoe should go via the SCSI tree.

Thanks Johannes! Should I resend with "Acked-by" added in order for it to be 
picked up? 

Best Regards,
Elena.


> 
> Byte,
>   Johannes
> 
> --
> Johannes Thumshirn  Storage
> jthumsh...@suse.de+49 911 74053 689
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH 0/4] block: Fixes for bdi handling

2017-03-09 Thread Arthur Marsh



Jan Kara wrote on 09/03/17 03:18:

Hi!

patches in this series fix the most urgent bugs that were introduced by commit
165a5e22fafb "block: Move bdi_unregister() to del_gendisk()" and by
0dba1314d4f8 "scsi, block: fix duplicate bdi name registration crashes".
In fact before these commits we had a different set of problems in the
code but they were less visible :).

I'm still waiting for test confirmation from Omar and Arthur Marsh who reported
issues but I'm not able to hit any problem anymore in my testing.  I think it
would be nice to get the patches to rc2 so to speed up things I'm posting the
patches now so that review can happen in parallel with the testing.

Other BDI handling fixes I have in my queue can wait a bit more since they are
either theoretical or long-standing issues. So I'll repost them once these four
are sorted out.

Honza



Sorry for the delay in replying, I had to leave the kernel with all 4 
patches applied rebuilding while I was at work and just booted it.


I've only done a kexec reboot so far but there were no problems - no 
errors in dmesg, all disks were recognised and all attempted mounts worked.


Thanks very much for the quick fix!

Arthur.


Re: [PATCH 1/4] block: Allow bdi re-registration

2017-03-09 Thread Jan Kara
On Wed 08-03-17 17:55:42, Tejun Heo wrote:
> Hello,
> 
> On Wed, Mar 08, 2017 at 05:48:31PM +0100, Jan Kara wrote:
> > @@ -710,6 +710,11 @@ static void cgwb_bdi_destroy(struct backing_dev_info 
> > *bdi)
> >  */
> > atomic_dec(>usage_cnt);
> > wait_event(cgwb_release_wait, !atomic_read(>usage_cnt));
> > +   /*
> > +* Grab back our reference so that we hold it when @bdi gets
> > +* re-registered.
> > +*/
> > +   atomic_inc(>usage_cnt);
> 
> So, this is more re-initializing the ref to the initial state so that
> it can be re-used, right?  Maybe ATOMIC_INIT() is a better choice here
> just to clarify what's going on?

OK, I was somewhat undecided between these two option but you and James are
probably right that re-init is clearer.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH 22/29] drivers, scsi: convert iscsi_task.refcount from atomic_t to refcount_t

2017-03-09 Thread Johannes Thumshirn
On 03/09/2017 08:18 AM, Reshetova, Elena wrote:
>> On Mon, Mar 06, 2017 at 04:21:09PM +0200, Elena Reshetova wrote:
>>> refcount_t type and corresponding API should be
>>> used instead of atomic_t when the variable is used as
>>> a reference counter. This allows to avoid accidental
>>> refcounter overflows that might lead to use-after-free
>>> situations.
>>>
>>> Signed-off-by: Elena Reshetova 
>>> Signed-off-by: Hans Liljestrand 
>>> Signed-off-by: Kees Cook 
>>> Signed-off-by: David Windsor 
>>
>> This looks OK to me.
>>
>> Acked-by: Chris Leech 
> 
> Thank you for review! Do you have a tree that can take this change? 

Hi Elena,

iscsi like fcoe should go via the SCSI tree.

Byte,
Johannes

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850