On 2018年04月14日 21:45, Timo Nentwig wrote: > On 04/14/2018 11:42 AM, Qu Wenruo wrote: >> And the work load when the RO happens is also helpful. >> (Well, the dmesg of RO happens would be the best though) > Surprisingly nothing special AFAIR. It's a private, mostly idle machine. > Probably "just" browsing with chrome. > I didn't notice the remount right away as there were no obvious > failures. And even then I kept it running for a couple more hours/a day > or so. > > I had a glance at dmesg but don't remember anything specific (think the > usual "---- [cut here] ---" + dump of registers, but I'm not even sure > about that). Sorry. > > Actually the same thing happened just a few days earlier and after a > reboot (and maybe fsck) it was back up and good. Was optimistic it would > go the same way this time as well :) In general I had to hard-reset (+ > fsck) a couple of times in recent times.
So, after each hard-reset, fsck is executed and no problem exposed by btrfs check (before RW mount)? That's interesting. > Except for the SSD it's an > all-new machine that I'm still OC/stress-testing. But not when that > particular event happened. A little off-topic, Linux + OC is not that common in my opinion. Especially when we don't have AMD Ryzen Master to Intel XTU under linux. >> Despite above salvage method, please also considering provide the >> following data, as your case is pretty special and may help us to catch >> a long hidden bug. > If only I had know I would have saved dmesg! :) > Sure, I'd be happy to help. If you need any more information just let me > know. >> 1) Extent tree dump >> Need above 2 patches applied first. >> >> # btrfs inspect dump-tree -t extent /dev/sda2 &> \ >> /tmp/extent_tree_dump >> If above dump is too large, "grep -C20 166030671872" of the output is >> also good enough. > > I'll send you a link to the full dump directly. It's good enough with the grepped result, feel free to delete the full dump. > item 16 key (166030671872 EXTENT_ITEM 4096) itemoff 3096 itemsize 51 > refs 1 gen 1702074 flags TREE_BLOCK > tree block key (162793705472 EXTENT_ITEM 4096) level 0 > tree block backref root 2 So at least btrfs still consider that tree block should belong to extent tree. > item 17 key (166030671872 BLOCK_GROUP_ITEM 1073741824) itemoff 3072 > itemsize 24 > block group used 96915456 chunk_objectid 256 flags METADATA Your metadata is SINGLE profile, default for SSD. Nothing special here. Currently speaking, the problem looks like that tree log tree block get allocated into extent tree (the only way btrfs allocate tree blocks without update extent tree). And when log tree get replayed, your fs is corrupted. Did you have several hard-reset before the fs mounted RO itself? >> 2) super block dump >> # btrfs inspect dump-super -f /dev/sda2 > superblock: bytenr=65536, device=/dev/sda2 > --------------------------------------------------------- > csum_type 0 (crc32c) > csum_size 4 > csum 0xef0068ba [match] > bytenr 65536 > flags 0x1 > ( WRITTEN ) > magic _BHRfS_M [match] > fsid 22e778f7-2499-4379-99d2-cdd399d1cc6e > label 830 > generation 1706541 The offending tree block has generation 1705980, which is 561 generations ago. Although it's hard to tell the real world time, at least the problem is not directly caused by your first automatical RO remount. The problem should exist for a while. > root 167104118784 > sys_array_size 97 > chunk_root_generation 1702072 > root_level 1 > chunk_root 186120536064 > chunk_root_level 1 > log_root 180056702976> log_root_transid 0 Not sure if this is common, need to double check later. > log_root_level 0 > total_bytes 63879249920 > bytes_used 36929691648 > sectorsize 4096 > nodesize 4096 Nodesize is not the default 16K, any reason for this? (Maybe performance?) >> 3) Extra hardware info about your sda >> Things like SMART and hardware model would also help here. > smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.15.15-1-ARCH] (local build) > Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Family: Samsung based SSDs > Device Model: SAMSUNG SSD 830 Series At least I haven't hear much problem about Samsung SSD, so I don't think it's the hardware to blamce. (Unlike Intel 600P) >> 4) The mount option of /dev/sda2 > > /dev/sda2 / btrfs compress=zstd,discard,autodefrag,subvol=/ > 0 0 Discard used to cause some problem, but it should be fixed in recent release IIRC. Despite that, discard option is not recommended IIRC, routine fstrim is preferred instead. > > And if that matters (AFAIK subvolume mount options have no effect anyway): > > /dev/sda2 /var/lib/postgres btrfs > compress=zstd,discard,nodatacow,subvol=var/lib/postgres 0 0 RDB could cause a lot of fsync, at least this explains why the tree log is so large. To be safe, it's recommended to use notreelog mount option, which will degrade fsync() to sync() for btrfs, so no log tree will be used. Although it will bring performance impact for sync(), it could help us to determine if it's really tree log to blame. > /dev/sda2 /var/cache btrfs > compress=off,discard,subvol=var/cache 0 0 > /dev/sda2 /var/tmp btrfs > compress=zstd,discard,subvol=var/tmp 0 0 > >> Thanks, >> Qu > > Got a couple of these: > We seem to be looping a lot on /mnt/sda2/var/lib/postgres/data/.., do > you want to keep going on ? (y/N/a): y Not familiar with btrfs-restore, so hard to say. But it seems to report false alert quite a lot. So keep it running seems good. Another way to verify if it's only your extent tree corrupted, btrfs inspect dump-tree could be used here. # btrfs inspect dump-tree -t <subvolid> /dev/sda2 > /dev/null If no stderr is outputted for all your subvolid, then it's should be pretty safe. Thanks for your info, your info indeed shows pretty useful clue here. 4K nodesize (so taller tree, smaller lock range) and RDB workload may be the key to the problem. Thanks, Qu > > Is this something I need to be worried about? Postgres did at least > start up. > > > Thanks a lot for your help! > Timo > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Description: OpenPGP digital signature