[PATCH 6/6] Fix unsafe fw_event_list usage

2015-06-08 Thread Calvin Owens
Since the fw_event deletes itself from the list, cleanup_queue() can walk onto garbage pointers or walk off into freed memory. This refactors the code in _scsih_fw_event_cleanup_queue() to not iterate over the fw_event_list without a lock. Signed-off-by: Calvin Owens calvinow...@fb.com

[PATCH 2/6] Refactor code to use new sas_device refcount

2015-06-08 Thread Calvin Owens
This patch refactors the code in the driver to use the new reference count on the sas_device struct. Signed-off-by: Calvin Owens calvinow...@fb.com --- drivers/scsi/mpt2sas/mpt2sas_base.h | 4 +- drivers/scsi/mpt2sas/mpt2sas_scsih.c | 329 --- drivers/scsi

[PATCH 1/6] Add refcount to sas_device struct

2015-06-08 Thread Calvin Owens
These objects can be referenced concurrently throughout the driver, we need a way to make sure threads can't delete them out from under each other. Signed-off-by: Calvin Owens calvinow...@fb.com --- drivers/scsi/mpt2sas/mpt2sas_base.h | 16 1 file changed, 16 insertions(+) diff

[PATCH 5/6] Refactor code to use new fw_event refcount

2015-06-08 Thread Calvin Owens
This refactors the fw_event code to use the new refcount. Signed-off-by: Calvin Owens calvinow...@fb.com --- drivers/scsi/mpt2sas/mpt2sas_scsih.c | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas

[RESEND][PATCH 0/6] Fixes for memory corruption in mpt2sas

2015-06-08 Thread Calvin Owens
Hello all, This patchset attempts to address problems we've been having with panics due to memory corruption from the mpt2sas driver. I will provide a similar set of fixes for mpt3sas, since we see similar issues there as well. Porting this to mpt3sas will be trivial since the part of the driver

[PATCH 4/6] Add refcount to fw_event_work struct

2015-06-08 Thread Calvin Owens
The fw_event_work struct is concurrently referenced at shutdown, so add a refcount to protect it. Signed-off-by: Calvin Owens calvinow...@fb.com --- drivers/scsi/mpt2sas/mpt2sas_scsih.c | 28 1 file changed, 28 insertions(+) diff --git a/drivers/scsi/mpt2sas

[PATCH 3/6] Fix unsafe sas_device_list usage

2015-06-08 Thread Calvin Owens
We cannot iterate over the list without holding a lock for the entire duration, or we risk corrupting random memory if items are added or deleted as we iterate. This refactors code such that it always holds the lock when iterating on or accessing the sas_device_list. Signed-off-by: Calvin Owens

[PATCH v3 2/2] mpt2sas: Refcount fw_events and fix unsafe list usage

2015-07-31 Thread Calvin Owens
() concurrently deletes items from the list. Cc: Christoph Hellwig h...@lst.de Signed-off-by: Calvin Owens calvinow...@fb.com --- Changes in v3: * Add a break condition to the REMOVE_UNRESPONDING_DEVICES fw_event, which can loop over a sleep forever (5m+ at least) at unloading. I

[PATCH v3 0/2] Fixes for memory corruption in mpt2sas

2015-07-31 Thread Calvin Owens
Hello all, This patchset attempts to address problems we've been having with panics due to memory corruption from the mpt2sas driver. Changes are noted in the individual patches, I realized putting them in the cover was probably a bit confusing. Thanks, Calvin Patches in this series: [PATCH

[PATCH v3 1/2] mpt2sas: Refcount sas_device objects and fix unsafe list usage

2015-07-31 Thread Calvin Owens
-off-by: Calvin Owens calvinow...@fb.com --- Changes in v3: * Drop the sas_device_lock while enabling devices, and leave the sas_device object on the list, since it may need to be looked up there while it is being enabled. * Drop put() in _scsih_add_device

[PATCH v4 1/2] mpt2sas: Refcount sas_device objects and fix unsafe list usage

2015-08-13 Thread Calvin Owens
-off-by: Calvin Owens calvinow...@fb.com --- Changes in v4: * Fix lack of put() in non-SATA case in _scsih_change_queue_depth() * Fix lack of put() in the non-error case in _scsih_check_device() * Add missing put() at bottom of _scsih_add_device() * Add put

[PATCH v4 0/2] Fixes for memory corruption in mpt2sas

2015-08-13 Thread Calvin Owens
Hello all, This patchset attempts to address problems we've been having with panics due to memory corruption from the mpt2sas driver. Thanks, Calvin [PATCH v4 1/2] mpt2sas: Refcount sas_device objects and fix unsafe list [PATCH v4 2/2] mpt2sas: Refcount fw_events and fix unsafe list usage

Re: [PATCH v3 1/2] mpt2sas: Refcount sas_device objects and fix unsafe list usage

2015-08-13 Thread Calvin Owens
On Monday 08/10 at 18:45 +0530, Sreekanth Reddy wrote: On Sat, Aug 1, 2015 at 10:32 AM, Calvin Owens calvinow...@fb.com wrote: Sreekanth, Thanks for the review, responses below. I'll have a v4 out shortly. Calvin These objects can be referenced concurrently throughout the driver, we need

[PATCH v4 2/2] mpt2sas: Refcount fw_events and fix unsafe list usage

2015-08-13 Thread Calvin Owens
() concurrently deletes items from the list. Cc: Christoph Hellwig h...@lst.de Signed-off-by: Calvin Owens calvinow...@fb.com --- Changes in v4: None Changes in v3: * Add a break condition to the REMOVE_UNRESPONDING_DEVICES fw_event, which can loop over a sleep forever (5m+ at least

Re: [PATCH 1/2] mpt2sas: Refcount sas_device objects and fix unsafe list usage

2015-07-21 Thread Calvin Owens
On Thursday 07/16 at 20:27 +0530, Sreekanth Reddy wrote: On Sun, Jul 12, 2015 at 9:54 AM, Calvin Owens calvinow...@fb.com wrote: These objects can be referenced concurrently throughout the driver, we need a way to make sure threads can't delete them out from under each other. This patch

Re: [PATCH 1/2] mpt2sas: Refcount sas_device objects and fix unsafe list usage

2015-07-21 Thread Calvin Owens
On Monday 07/13 at 11:05 -0400, Joe Lawrence wrote: On 07/12/2015 12:24 AM, Calvin Owens wrote: These objects can be referenced concurrently throughout the driver, we need a way to make sure threads can't delete them out from under each other. This patch adds the refcount, and refactors

Re: [PATCH 1/2] mpt2sas: Refcount sas_device objects and fix unsafe list usage

2015-07-21 Thread Calvin Owens
On Sunday 07/12 at 23:52 -0700, Christoph Hellwig wrote: On Sat, Jul 11, 2015 at 09:24:55PM -0700, Calvin Owens wrote: These objects can be referenced concurrently throughout the driver, we need a way to make sure threads can't delete them out from under each other. This patch adds

[PATCH] sg: Fix double-free when drives detach during SG_IO

2015-10-30 Thread Calvin Owens
if it isn't embedded in the object itself. KASAN was extremely helpful in finding the root cause of this bug. Signed-off-by: Calvin Owens <calvinow...@fb.com> --- drivers/scsi/sg.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/sg.c b/drivers/scsi/

Re: [PATCH 5/6] Refactor code to use new fw_event refcount

2015-07-11 Thread Calvin Owens
Thanks for this, I'm sending a v2 shortly. On Friday 07/03 at 09:00 -0700, Christoph Hellwig wrote: On Mon, Jun 08, 2015 at 08:50:55PM -0700, Calvin Owens wrote: This refactors the fw_event code to use the new refcount. I spent some time looking over this code because it's so convoluted

Re: [PATCH 2/6] Refactor code to use new sas_device refcount

2015-07-11 Thread Calvin Owens
On Friday 07/03 at 08:38 -0700, Christoph Hellwig wrote: +struct _sas_device * +mpt2sas_scsih_sas_device_get_by_sas_address_nolock(struct MPT2SAS_ADAPTER *ioc, +u64 sas_address) Any chance to use a shorter name for this function? E.g. __mpt2sas_get_sdev_by_addr ? Will do.

Re: [PATCH 6/6] Fix unsafe fw_event_list usage

2015-07-11 Thread Calvin Owens
On Friday 07/03 at 09:02 -0700, Christoph Hellwig wrote: On Mon, Jun 08, 2015 at 08:50:56PM -0700, Calvin Owens wrote: Since the fw_event deletes itself from the list, cleanup_queue() can walk onto garbage pointers or walk off into freed memory. This refactors the code

[PATCH 2/2] mpt2sas: Refcount fw_events and fix unsafe list usage

2015-07-11 Thread Calvin Owens
() concurrently deletes items from the list. Cc: Christoph Hellwig h...@infradead.org Cc: Bart Van Assche bart.vanass...@sandisk.com Signed-off-by: Calvin Owens calvinow...@fb.com --- drivers/scsi/mpt2sas/mpt2sas_scsih.c | 101 --- 1 file changed, 81 insertions(+), 20 deletions

[PATCH 0/2 v2] Fixes for memory corruption in mpt2sas

2015-07-11 Thread Calvin Owens
Hello all, This patchset attempts to address problems we've been having with panics due to memory corruption from the mpt2sas driver. Thanks, Calvin Patches in this series: [PATCH 1/2] mpt2sas: Refcount sas_device objects and fix unsafe list usage [PATCH 2/2] mpt2sas: Refcount fw_events and fix

[PATCH 1/2] mpt2sas: Refcount sas_device objects and fix unsafe list usage

2015-07-11 Thread Calvin Owens
, or we risk corrupting random memory if items are added or deleted as we iterate. This patch refactors _scsih_probe_sas() to use the sas_device_list in a safe way. Cc: Christoph Hellwig h...@infradead.org Cc: Bart Van Assche bart.vanass...@sandisk.com Signed-off-by: Calvin Owens calvinow...@fb.com

Re: [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas

2015-08-26 Thread Calvin Owens
On Wednesday 08/26 at 04:09 +, Nicholas A. Bellinger wrote: From: Nicholas Bellinger n...@linux-iscsi.org Hi James Co, This series is a mpt3sas forward port of Calvin Owens' in-flight reference counting bugfixes for mpt2sas LLD code here: [PATCH v4 0/2] Fixes for memory corruption

Re: [PATCH 2/2] mpt3sas: Refcount fw_events and fix unsafe list usage

2015-08-26 Thread Calvin Owens
_scsih_fw_event_cleanup_queue() such that it no longer iterates over the list without holding the lock, since _firmware_event_work() concurrently deletes items from the list. This patch is a port of Calvin's PATCH-v4 for mpt2sas code. Cc: Calvin Owens calvinow...@fb.com Cc: Christoph Hellwig h...@infradead.org Cc

Re: [PATCH] ses: Fix racy cleanup of /sys in remove_dev()

2016-06-02 Thread Calvin Owens
On 05/13/2016 01:28 PM, Calvin Owens wrote: Currently we free the resources backing the enclosure device before we call device_unregister(). This is racy: during rmmod of low-level SCSI drivers that hook into enclosure, we end up with a small window of time during which writing to /sys can OOPS

Re: [PATCH] ses: Fix racy cleanup of /sys in remove_dev()

2016-06-15 Thread Calvin Owens
On Thursday 06/02 at 15:50 -0700, Calvin Owens wrote: > On 05/13/2016 01:28 PM, Calvin Owens wrote: > > Currently we free the resources backing the enclosure device before we > > call device_unregister(). This is racy: during rmmod of low-level SCSI > > drivers that hook into

[PATCH] mpt3sas: Don't overreach ioc->reply_post[] during initialization

2016-03-19 Thread Calvin Owens
pt3sas] [] do_one_initcall+0x113/0x2b0 [] do_init_module+0x1d0/0x4d8 [] load_module+0x6729/0x8dc0 [] SYSC_init_module+0x183/0x1a0 [] SyS_init_module+0xe/0x10 [] entry_SYSCALL_64_fastpath+0x12/0x6a Fix this by pulling the value at the beginning of the loop. Signed-off-by:

Re: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS PHY objects

2016-05-16 Thread Calvin Owens
On Friday 05/13 at 21:17 +, Elliott, Robert (Persistent Memory) wrote: > > > > -Original Message- > > From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel- > > ow...@vger.kernel.org] On Behalf Of Calvin Owens > > Sent: Friday, May 13, 2016 3:2

Re: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS PHY objects

2016-05-16 Thread Calvin Owens
e_port() explicitly is HPSA, and it does it in the opposite order mpt3sas does: scsi_remove_host() first. Thanks, Calvin > -Original Message- > From: Calvin Owens [mailto:calvinow...@fb.com] > Sent: Monday, May 16, 2016 2:25 PM > To: Elliott, Robert (Persistent Memory) > Cc: Sath

[PATCH] ses: Fix racy cleanup of /sys in remove_dev()

2016-05-13 Thread Calvin Owens
driver core holds a reference over ->remove_dev(), so AFAICT this is safe. Signed-off-by: Calvin Owens <calvinow...@fb.com> --- drivers/scsi/ses.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c index 53ef1cb..0e8601a 100644 ---

Re: [PATCH] mpt3sas: Do scsi_remove_host() before deleting SAS PHY objects

2016-05-18 Thread Calvin Owens
On Wednesday 05/18 at 18:44 +0530, Sreekanth Reddy wrote: > On Tue, May 17, 2016 at 8:43 AM, Calvin Owens <calvinow...@fb.com> wrote: > > On Monday 05/16 at 15:51 -0600, Sathya Prakash Veerichetty wrote: > >> Our understanding is the relationship between the SCSI host

[PATCH 1/3] mpt3sas: Eliminate conditional locking in mpt3sas_scsih_issue_tm()

2016-07-28 Thread Calvin Owens
This flag that conditionally acquires the mutex is confusing and prone to bugginess: refactor it into two separate function calls, and make the unlocked one complain if it's called outside the mutex. Signed-off-by: Calvin Owens <calvinow...@fb.com> --- drivers/scsi/mpt3sas/mpt3sas_base.h

[PATCH 3/3] mpt3sas: Fix warnings exposed by W=1

2016-07-28 Thread Calvin Owens
with the potential error is non-trivial, so for now just WARN(). Signed-off-by: Calvin Owens <calvinow...@fb.com> --- drivers/scsi/mpt3sas/mpt3sas_base.c | 18 +++- drivers/scsi/mpt3sas/mpt3sas_config.c| 4 +- drivers/scsi/mpt3sas/mpt3sas_ctl.c | 29 ++--- drivers/scsi/m

[PATCH 2/3] mpt3sas: Eliminate dead sleep_flag code

2016-07-28 Thread Calvin Owens
With the exception of a single call to wait_for_doorbell_int(), all this conditional sleeping code is dead. So delete it. Signed-off-by: Calvin Owens <calvinow...@fb.com> --- drivers/scsi/mpt3sas/mpt3sas_base.c | 241 +-- drivers/scsi/mpt3sas/mpt3sas_

Re: [PATCH] ses: Fix racy cleanup of /sys in remove_dev()

2016-07-27 Thread Calvin Owens
On 06/15/2016 01:24 PM, Calvin Owens wrote: On Thursday 06/02 at 15:50 -0700, Calvin Owens wrote: On 05/13/2016 01:28 PM, Calvin Owens wrote: Currently we free the resources backing the enclosure device before we call device_unregister(). This is racy: during rmmod of low-level SCSI drivers

[PATCH] mpt3sas: Ensure the connector_name string is NUL-terminated

2016-07-27 Thread Calvin Owens
byte beyond our character array happens to be a NUL. Fix this by explicitly writing '\0' to the end of the string to ensure we don't run off the edge of the world in printk(). Signed-off-by: Calvin Owens <calvinow...@fb.com> --- drivers/scsi/mpt3sas/mpt3sas_base.h | 2 +- drivers/scsi/m

Re: [BUG] Slab corruption during XFS writeback under memory pressure

2016-07-18 Thread Calvin Owens
On 07/17/2016 11:02 PM, Dave Chinner wrote: On Sun, Jul 17, 2016 at 10:00:03AM +1000, Dave Chinner wrote: On Fri, Jul 15, 2016 at 05:18:02PM -0700, Calvin Owens wrote: Hello all, I've found a nasty source of slab corruption. Based on seeing similar symptoms on boxes at Facebook, I suspect

Re: [BUG] Slab corruption during XFS writeback under memory pressure

2016-07-19 Thread Calvin Owens
On 07/18/2016 07:05 PM, Calvin Owens wrote: On 07/17/2016 11:02 PM, Dave Chinner wrote: On Sun, Jul 17, 2016 at 10:00:03AM +1000, Dave Chinner wrote: On Fri, Jul 15, 2016 at 05:18:02PM -0700, Calvin Owens wrote: Hello all, I've found a nasty source of slab corruption. Based on seeing similar

[BUG] Slab corruption during XFS writeback under memory pressure

2016-07-15 Thread Calvin Owens
Hello all, I've found a nasty source of slab corruption. Based on seeing similar symptoms on boxes at Facebook, I suspect it's been around since at least 3.10. It only reproduces under memory pressure so far as I can tell: the issue seems to be that XFS reclaims pages from buffers that are