Rephrased the problem description here,
When host issue shutdown + D3hot in suspend, NVMe drive might have chance 
choosing wrong pointer which has already been used by GC then cause over 
program.
Do GC before shutdown -> delete IO Q -> shutdown from host -> breakup GC -> 
D3hot -> enter PS4 -> have a chance swap block -> use wrong pointer on device 
SRAM -> over program
The issue only happens in simple suspend (shutdown+D3hot) with specific FW on 
Kahoku board.


Regards,
Gloria Tsai
_____________________________________

Sales PM Division
Solid State Storage Technology Corporation
TEL: +886-3-612-3888 ext. 2201
E-Mail: gloria.t...@ssstc.com
_____________________________________

-----Original Message-----
From: Christoph Hellwig <h...@lst.de> 
Sent: Tuesday, November 3, 2020 2:13 AM
To: Jongpil Jung <jongp...@gmail.com>
Cc: Keith Busch <kbu...@kernel.org>; Jens Axboe <ax...@fb.com>; Christoph 
Hellwig <h...@lst.de>; Sagi Grimberg <s...@grimberg.me>; 
linux-n...@lists.infradead.org; linux-kernel@vger.kernel.org; Gloria Tsai 
<gloria.t...@ssstc.com>; jongpil19.j...@samsung.com; jongheony....@samsung.com; 
dj54.s...@samsung.com
Subject: Re: [PATCH V3 1/1] nvme: Add quirk for LiteON CL1 devices running FW 
220TQ,22001



This message was sent from outside of the company. Please do not click links or 
open attachments unless you recognize the source of this email and know the 
content is safe.


On Thu, Oct 29, 2020 at 03:55:29PM +0100, Christoph Hellwig wrote:
> I'm still worried about this.
>
> If power state based suspend does always work despite a HMB and is 
> preferred for the specific Google board we should have purely a DMI 
> based quirk for the board independent of the NVMe controller used with 
> it.
>
> But if these LiteON devices can't properly handle nvme_dev_disable 
> calls we have much deeper problems, because it can be called in all 
> kinds of places, including suspending when not on this specific board.
>
> That being said, I still really do not understand this sentence and 
> thus the problem at all:
>
> > When NVMe device receive D3hot from host, NVMe firmware will do 
> > garbage collection. While NVMe device do Garbage collection, 
> > firmware has chance to going incorrect address.

Any progress in describing the problem a little better?

Reply via email to