Hi,

similar suspicious code path can be found in the queuecommand functions in 
other drivers too
these are -
pmcraid.c
arcmsr_hba.c
cc-ing maintainers - 
(but both drivers seem to be unmaintained for a while -
I've added Ching for arcmsr and Raghava for pmcraid)

please read this thread and verify that your driver and firmware is safe.

It is expected that a raid card controls the disk write cache (switches it off)
but this might not be true for all modes of operation a raid adapter handles
- pass through/non-RAID etc. In such case when disk write cache is enabled
and your driver skips the SYNCHRONIZE_CACHE command a FS corruption
is very much possible during unexpected power loss or even a clean shutdown.

tomash

On 17.10.2016 19:51, James Bottomley wrote:
> On Mon, 2016-10-17 at 23:06 +0530, Kashyap Desai wrote:
>>> -----Original Message-----
>>> From: James Bottomley [mailto:j...@linux.vnet.ibm.com]
>>> Sent: Monday, October 17, 2016 10:50 PM
>>> To: Ric Wheeler; Hannes Reinecke; Sumit Saxena; 
>>> linux-scsi@vger.kernel.org
>>> Cc: martin.peter...@oracle.com; the...@redhat.com;
>>> kashyap.de...@broadcom.com; Christoph Hellwig; Martin K. Petersen;
>>> Jeff
>>> Moyer; Gris Ge; Ewan Milne; Jens Axboe
>>> Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE
>>> command
>>> to firmware
>>>
>>> On Mon, 2016-10-17 at 12:27 -0400, Ric Wheeler wrote:
>>>> On 10/17/2016 12:20 PM, James Bottomley wrote:
>>>>> On Mon, 2016-10-17 at 09:01 -0400, Ric Wheeler wrote:
>>>>>> On 10/17/2016 07:34 AM, Hannes Reinecke wrote:
>>>>>>> On 10/17/2016 12:24 PM, Sumit Saxena wrote:
>>>>>>>> megaraid_sas driver returns SYNCHRONIZE_CACHE command
>>>>>>>> back to
>>>>>>>> SCSI layer without sending it to firmware as firmware
>>>>>>>> takes
>>>>>>>> care of flushing cache.  This patch will change the
>>>>>>>> driver
>>>>>>>> behavior wrt SYNCHRONIZE_CACHE handling. If underlying
>>>>>>>> firmware has support to handle the SYNCHORNIZE_CACHE,
>>>>>>>> driver
>>>>>>>> will send it for firmware otherwise complete it back to
>>>>>>>> SCSI
>>>>>>>> layer with SUCCESS immediately.  If  Firmware  handle
>>>>>>>> SYNCHORNIZE_CACHE for both VD and JBOD
>>>>>>>> "canHandleSyncCache"
>>>>>>>> bit in scratch pad register(offset
>>>>>>>> 0x00B4) will be set.
>>>>>>>>
>>>>>>>> This behavior can be controlled via module parameter and
>>>>>>>> user
>>>>>>>> can fallback to old behavior of returning
>>>>>>>> SYNCHRONIZE_CACHE by
>>>>>>>> driver only without sending it to firmware.
>>>>>>>>
>>>>>>>> Signed-off-by: Sumit Saxena <sumit.sax...@broadcom.com>
>>>>>>>> Signed-off-by: Kashyap Desai <kashyap.de...@broadcom.com>
>>>>>>>> ---
>>>>>>>>    drivers/scsi/megaraid/megaraid_sas.h        |  3 +++
>>>>>>>>    drivers/scsi/megaraid/megaraid_sas_base.c   | 14
>>>>>>>> ++++++---
>>>>>>>> -----
>>>>>>>>    drivers/scsi/megaraid/megaraid_sas_fusion.c |  3 +++
>>>>>>>>    drivers/scsi/megaraid/megaraid_sas_fusion.h |  2 ++
>>>>>>>>    4 files changed, 14 insertions(+), 8 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas.h
>>>>>>>> b/drivers/scsi/megaraid/megaraid_sas.h
>>>>>>>> index ca86c88..43fd14f 100644
>>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas.h
>>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas.h
>>>>>>>> @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT {
>>>>>>>>    #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT    14
>>>>>>>>    #define MR_MAX_MSIX_REG_ARRAY                   16
>>>>>>>>    #define MR_RDPQ_MODE_OFFSET                 0X0
>>>>>>>> 0800
>>>>>>>> 000
>>>>>>>> +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET               0
>>>>>>>> X010
>>>>>>>> 0000
>>>>>>>> 0
>>>>>>>> +
>>>>>>>>    /*
>>>>>>>>    * register set for both 1068 and 1078 controllers
>>>>>>>>    * structure extended for 1078 registers @@ -2140,6
>>>>>>>> +2142,7
>>>>>>>> @@ struct megasas_instance {
>>>>>>>>        u8 is_imr;
>>>>>>>>        u8 is_rdpq;
>>>>>>>>        bool dev_handle;
>>>>>>>> +      bool fw_sync_cache_support;
>>>>>>>>    };
>>>>>>>>    struct MR_LD_VF_MAP {
>>>>>>>>        u32 size;
>>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c
>>>>>>>> b/drivers/scsi/megaraid/megaraid_sas_base.c
>>>>>>>> index 092387f..a4a8e2f 100644
>>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas_base.c
>>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
>>>>>>>> @@ -104,6 +104,10 @@ unsigned int scmd_timeout =
>>>>>>>> MEGASAS_DEFAULT_CMD_TIMEOUT;
>>>>>>>>    module_param(scmd_timeout, int, S_IRUGO);
>>>>>>>>    MODULE_PARM_DESC(scmd_timeout, "scsi command timeout
>>>>>>>> (10
>>>>>>>> -90s), default 90s. See megasas_reset_timer.");
>>>>>>>>
>>>>>>>> +bool block_sync_cache;
>>>>>>>> +module_param(block_sync_cache, bool, S_IRUGO);
>>>>>>>> +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by
>>>>>>>> driver
>>>>>>>> Default: 0(Send it to firmware)");
>>>>>>>> +
>>>>>>>>    MODULE_LICENSE("GPL");
>>>>>>>>    MODULE_VERSION(MEGASAS_VERSION);
>>>>>>>>    MODULE_AUTHOR("megaraidlinux....@avagotech.com");
>>>>>>>> @@ -1700,16 +1704,10 @@ megasas_queue_command(struct
>>> Scsi_Host
>>>>>>>> *shost, struct scsi_cmnd *scmd)
>>>>>>>>                goto out_done;
>>>>>>>>        }
>>>>>>>>
>>>>>>>> -      switch (scmd->cmnd[0]) {
>>>>>>>> -      case SYNCHRONIZE_CACHE:
>>>>>>>> -              /*
>>>>>>>> -               * FW takes care of flush cache on its
>>>>>>>> own
>>>>>>>> -               * No need to send it down
>>>>>>>> -               */
>>>>>>>> +      if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) &&
>>>>>>>> +              (!instance->fw_sync_cache_support)) {
>>>>>>>>                scmd->result = DID_OK << 16;
>>>>>>>>                goto out_done;
>>>>>>>> -      default:
>>>>>>>> -              break;
>>>>>>>>        }
>>>>>>>>
>>>>>>>>        return instance->instancet
>>>>>>>> ->build_and_issue_cmd(instance, scmd);
>>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c
>>>>>>>> b/drivers/scsi/megaraid/megaraid_sas_fusion.c
>>>>>>>> index 2159f6a..8237580 100644
>>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
>>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
>>>>>>>> @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct
>>>>>>>> megasas_instance *instance)
>>>>>>>>                ret = 1;
>>>>>>>>                goto fail_fw_init;
>>>>>>>>        }
>>>>>>>> +      if (!block_sync_cache)
>>>>>>>> +              instance->fw_sync_cache_support =
>>>>>>>> (scratch_pad_2
>>>>>>>> &
>>>>>>>> +                      MR_CAN_HANDLE_SYNC_CACHE_OFFSET)
>>>>>>>> ? 1
>>>>>>>> :
>>>>>>>> 0;
>>>>>>>>
>>>>>>>>        IOCInitMessage =
>>>>>>>>          dma_alloc_coherent(&instance->pdev->dev,
>>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h
>>>>>>>> b/drivers/scsi/megaraid/megaraid_sas_fusion.h
>>>>>>>> index e3bee04..2857154 100644
>>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h
>>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h
>>>>>>>> @@ -72,6 +72,8 @@
>>>>>>>>    #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET
>>>>>>>>  (0x0000030C)
>>>>>>>>    #define MPI2_REPLY_POST_HOST_INDEX_OFFSET   (0x00
>>>>>>>> 0000
>>>>>>>> 6C)
>>>>>>>>
>>>>>>>> +extern bool block_sync_cache;
>>>>>>>> +
>>>>>>>>    /*
>>>>>>>>     * Raid context flags
>>>>>>>>     */
>>>>>>>>
>>>>>>> Be extra careful with that.
>>>>>>>
>>>>>>> SYNCHRONIZE_CACHE has (potentially) a global scope, as
>>>>>>> there
>>>>>>> might be an array-wide cache, and a cache flush would
>>>>>>> affect the
>>>>>>> entire cache. Linux is using SYNCHRONIZE_CACHE as a per
>>>>>>> device
>>>>>>> setting, ie it assumes that the effects of a cache flush is
>>>>>>> restricted to the device in question.
>>>>>>>
>>>>>>> If this is _not_ the case I'd rather not enable it.
>>>>>>> Have you checked that enabling this functionality doesn't
>>>>>>> have
>>>>>>> negative performance impact?
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Hannes
>>>>>> This must go in - without this fix, there is no data
>>>>>> integrity for
>>>>>> any file system.
>>>>> That's not what I get from the change log.  What it says to me
>>>>> is
>>>>> that the caches are currently firmware managed.  Barring
>>>>> firmware
>>>>> bugs, that means that we currently don't have any integrity
>>>>> issues.
>>>> Your understanding (or the change log) is incorrect.
>>> There's no current way in English of construing "as firmware takes
>>> care of
>>> flushing cache" to mean its inverse, so the changelog needs
>>> updating to
>>> explain
>>> that firmware does *not* take care of cache flushing, so
>>> effectively
>>> nothing
>>> does and we'll need a cc to stable if the problem is that nothing
>>> takes
>>> care of
>>> flushing the drive caches.
>>>
>>> James
>> Sorry for confusion. More accurate to say  -
>>
>> Current megaraid_sas driver returns SYNCHRONIZE_CACHE command back to 
>> SCSI layer without sending it down to firmware as firmware supposed 
>> to takes care of flushing cache for all Virtual Disk at the time of 
>> system reboot/shutdown. Because of above FW rule driver coded wrongly 
>> and it added bug that SYNCHRONIZE_CACHE is not passed to FW for JBOD 
>> as well.  (MR Firmware wanted OS to manage SYNCHRONIZE_CACHE and pass 
>> the same to the Firmware. ). We figure out that even for VD, why 
>> driver should send back SYNC_CACHE command...let's have the same in
>> FW (giving all control in FW)
>>
>> New behavior described -
>>
>> IF application requests SYNCH_CACHE
>>  IF any FW which supports new API bit called canHandleSyncCache
>>   Driver sends SYNCH_CACHE command to the FW
>>   IF 'VirtualDisk'
>>       FW does not send SYNCH_CACHE to drives
>>       FW returns SUCCESS
>> ELSE IF 'JBOD'
>>      FW sends SYNCH_CACHE to drive
>>      FW obtains status from drive and returns same status back to
>> driver
>>   ENDIF
>>
>> Fixing this is robust if FW and driver changes are inline. See below 
>> for more detail.
>>
>> Case - 1
>> This patch is attempt to fix one level problem where Virtual Disks 
>> are not managed correctly in MR FW. There are some MR FW (E.a Gen2 
>> and Gen2.5 FW), which set WCE bit for Virtual Disk but FW is not 
>> reply correct for SYNCH_CACHE.  This was handled in past and carry 
>> forward (in driver + FW ) to all new generation products as well. We 
>> tried to collect all possible details why it was handled such way to 
>> provide better driver fix for this issue(mainly to avoid this FW 
>> checks and module  parameters etc..). But now it looks like to make 
>> generic fix (Device ID based etc..)is even risky, so went with this 
>> protective approach.  It is very risky to find out which Device ID 
>> and FW has capability to manage SYNC_CACHE, so we managed to figure 
>> out which are the new FW using FW capability bit.
>>
>> E.g Liberator/Thunderbolt MR FW SYNC_CACHE is write Unsupported 
>> command (this is a firmware bug) and immediately it will be failed to 
>> SML. This will cause File system to go Read-only.
> That's a better description ... what you're saying is that the patch
> doesn't actually fix the bug Ric is worrying about.
>
>> Case -2
>> One more thing which we are trying and known driver can send 
>> "SYNC_CACHE" unconditionally to all generation of FW.  For this we 
>> are doing more unit testing on various controllers/FW and planning to 
>> provide another patch to fix JBOD path for any FW. (It will not be 
>> based on FW capability/module parameter).
> OK, but we really need this ASAP.  As Ric said, data integrity is at
> risk until this is fixed.  Can you also reference the commit that
> introduced the problem so we know how far to do the sable backports?
>
>> If we remove module parameter, we can ask customer to disable WCE on
>> drive to get similar impact.
> Thanks,
>
> James
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to