On 12/4/18 6:38 PM, Guenter Roeck wrote:
> On Tue, Dec 04, 2018 at 03:47:46PM -0700, Jens Axboe wrote:
>> If we attempt a direct issue to a SCSI device, and it returns BUSY, then
>> we queue the request up normally. However, the SCSI layer may have
>> already setup SG tables etc for this particular command. If we later
>> merge with this request, then the old tables are no longer valid. Once
>> we issue the IO, we only read/write the original part of the request,
>> not the new state of it.
>>
>> This causes data corruption, and is most often noticed with the file
>> system complaining about the just read data being invalid:
>>
>> [  235.934465] EXT4-fs error (device sda1): ext4_iget:4831: inode #7142: 
>> comm dpkg-query: bad extra_isize 24937 (inode size 256)
>>
>> because most of it is garbage...
>>
>> This doesn't happen from the normal issue path, as we will simply defer
>> the request to the hardware queue dispatch list if we fail. Once it's on
>> the dispatch list, we never merge with it.
>>
>> Fix this from the direct issue path by flagging the request as
>> REQ_NOMERGE so we don't change the size of it before issue.
>>
>> See also:
>>   https://bugzilla.kernel.org/show_bug.cgi?id=201685
>>
>> Fixes: 6ce3dd6eec1 ("blk-mq: issue directly if hw queue isn't busy in case 
>> of 'none'")
>> Signed-off-by: Jens Axboe <[email protected]>
> 
> Tested-by: Guenter Roeck <[email protected]>
> 
> ... on two systems affected by the problem.

Thanks for testing! And for being persistent in reproducing and
providing clues for getting this nailed.

-- 
Jens Axboe

Reply via email to