On 05/27/2018 11:41 AM, Nikolay Borisov wrote:
> 
> 
> On 27.05.2018 08:50, Andrei Borzenkov wrote:
>> 23.05.2018 09:32, Nikolay Borisov пишет:
>>>
>>>
>>> On 22.05.2018 23:05, ein wrote:
>>>> Hello devs,
>>>>
>>>> I tested BTRFS in production for about a month:
>>>>
>>>> 21:08:17 up 34 days,  2:21,  3 users,  load average: 0.06, 0.02, 0.00
>>>>
>>>> Without power blackout, hardware failure, SSD's SMART is flawless etc.
>>>> The tests ended with:
>>>>
>>>> root@node0:~# dmesg | grep BTRFS | grep warn
>>>> 185:980:[2927472.393557] BTRFS warning (device dm-0): csum failed root
>>>> -9 ino 312 off 608284672 csum 0x7d03a376 expected csum 0x3163a9b7 mirror 1
>>>> 186:981:[2927472.394158] BTRFS warning (device dm-0): csum failed root
>>>> -9 ino 312 off 608284672 csum 0x7da1b152 expected csum 0x3163a9b7 mirror 1
>>>> 191:986:[2928224.169814] BTRFS warning (device dm-0): csum failed root
>>>> -9 ino 314 off 608284672 csum 0x7d03a376 expected csum 0x3163a9b7 mirror 1
>>>> 192:987:[2928224.171433] BTRFS warning (device dm-0): csum failed root
>>>> -9 ino 314 off 608284672 csum 0x7da1b152 expected csum 0x3163a9b7 mirror 1
>>>> 206:1001:[2928298.039516] BTRFS warning (device dm-0): csum failed root
>>>> -9 ino 319 off 608284672 csum 0x7d03a376 expected csum 0x3163a9b7 mirror 1
>>>> 207:1002:[2928298.043103] BTRFS warning (device dm-0): csum failed root
>>>> -9 ino 319 off 608284672 csum 0x7d03a376 expected csum 0x3163a9b7 mirror 1
>>>> 208:1004:[2932213.513424] BTRFS warning (device dm-0): csum failed root
>>>> 5 ino 219962 off 4564959232 csum 0xc616afb4 expected csum 0x5425e489
>>>> mirror 1
>>>> 209:1005:[2932235.666368] BTRFS warning (device dm-0): csum failed root
>>>> 5 ino 219962 off 16989835264 csum 0xd63ed5da expected csum 0x7429caa1
>>>> mirror 1
>>>> 210:1072:[2936767.229277] BTRFS warning (device dm-0): csum failed root
>>>> 5 ino 219915 off 82318458880 csum 0x83614341 expected csum 0x0b8706f8
>>>> mirror 1
>>>> 211:1073:[2936767.276229] BTRFS warning (device dm-0): csum failed root
>>>> 5 ino 219915 off 82318458880 csum 0x83614341 expected csum 0x0b8706f8
>>>> mirror 1
>>>>
>>>> Above has been revealed during below command and quite high IO usage by
>>>> few VMs (Linux on top Ext4 with firebird database, lots of random
>>>> read/writes, two others with Windows 2016 and Windows Update in the
>>>> background):
>>>
>>> I believe you are hitting the issue described here:
>>>
>>> https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg25656.html
>>>
>>> Essentially the way qemu operates on vm images atop btrfs is prone to
>>> producing such errors. As a matter of fact, other filesystems also
>>> suffer from this(i.e pages modified while being written, however due to
>>> lack of CRC on the data they don't detect it). Can you confirm that
>>> those inodes (312/314/319/219962/219915) belong to vm images files?
>>>
>>> IMHO the best course of action would be to disable checksumming for you
>>> vm files.
>>>
>>>
>>> For some background I suggest you read the following LWN articles:
>>>
>>> https://lwn.net/Articles/486311/
>>> https://lwn.net/Articles/442355/
>>>
>>
>> Hmm ... according to these articles, "pages under writeback are marked
>> as not being writable; any process attempting to write to such a page
>> will block until the writeback completes". And it says this feature is
>> available since 3.0 and btrfs has it. So how comes it still happens?
>> Were stable patches removed since then?
> 
> If you are using buffered writes, then yes you won't have the problem.
> However qemu by default bypasses host's page cache and instead uses DIO:
> 
> https://btrfs.wiki.kernel.org/index.php/Gotchas#Direct_IO_and_CRCs

I can confirm that writing data to the filesystem on guest side is not
buffered at host with config:

<disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source file='/var/lib/libvirt/images/db.raw'/>
      <target dev='vda' bus='virtio'/>
      [...]
</disk>

Because buff/cache memory usage stays unchanged at host during high
sequential writing and there's no kworker/flush process committing the
data. How qemu can avoid dirty page buffering? There's nothing else
than:ppoll, read, io_sumbit and write in strace:

read(52, "\1\0\0\0\0\0\0\0", 512)       = 8
io_submit(0x7f35367f7000, 2, [{pwritev, fildes=19,
iovec=[{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
iov_len=368640}, {iov_base="\0\0\0\0\0\0
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
iov_len=368640},
{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
iov_len=679
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=368640},
{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
iov_len=679936}, {iov_bas
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=368640},
{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
iov_len=679936}, {iov_base="\0\0\0\0\0\
1048576},
{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
iov_len=1048576},
{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
read(38, "\3\0\0\0\0\0\0\0", 512)       = 8
ppoll([{fd=52, events=POLLIN|POLLERR|POLLHUP}, {fd=38,
events=POLLIN|POLLERR|POLLHUP}, {fd=10, events=POLLIN|POLLERR|POLLHUP}],
3, NULL, NULL, 8) = 1 ([{fd=52, revents=POLLIN}])
read(52, "\1\0\0\0\0\0\0\0", 512)       = 8
io_submit(0x7f35367f7000, 1, [{pwritev, fildes=19,
iovec=[{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
iov_len=368640}, {iov_base="\0\0\0\0\0\0
\0\0\0\0\0\0\0\0\0\0"..., iov_len=1048576},
{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
iov_len=368640}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0
ppoll([{fd=52, events=POLLIN|POLLERR|POLLHUP}, {fd=38,
events=POLLIN|POLLERR|POLLHUP}, {fd=10, events=POLLIN|POLLERR|POLLHUP}],
3, {tv_sec=0, tv_nsec=0}, NULL, 8) = 2 ([{fd=52, re

-- 
PGP Public Key (RSA/4096b):
ID: 0xF2C6EA10
SHA-1: 51DA 40EE 832A 0572 5AD8 B3C0 7AFF 69E1 F2C6 EA10
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to