Re: [PATCH RESEND 0/8] btrfs-progs: sub: Relax the privileges of "subvolume list/show"
Misono Tomohiro - 27.11.18, 06:24: > Importantly, in order to make output consistent for both root and > non-privileged user, this changes the behavior of "subvolume list": > - (default) Only list in subvolume under the specified path. >Path needs to be a subvolume. Does that work recursively? I wound find it quite unexpected if I did btrfs subvol list in or on the root directory of a BTRFS filesystem would not display any subvolumes on that filesystem no matter where they are. Thanks, -- Martin
Re: Interpreting `btrfs filesystem show'
Hugo Mills - 15.10.18, 16:26: > On Mon, Oct 15, 2018 at 05:24:08PM +0300, Anton Shepelev wrote: > > Hello, all > > > > While trying to resolve free space problems, and found that > > > > I cannot interpret the output of: > > > btrfs filesystem show > > > > Label: none uuid: 8971ce5b-71d9-4e46-ab25-ca37485784c8 > > Total devices 1 FS bytes used 34.06GiB > > devid1 size 40.00GiB used 37.82GiB path /dev/sda2 > > > > How come the total used value is less than the value listed > > for the only device? > >"Used" on the device is the mount of space allocated. "Used" on the > FS is the total amount of actual data and metadata in that > allocation. > >You will also need to look at the output of "btrfs fi df" to see > the breakdown of the 37.82 GiB into data, metadata and currently > unused. I usually use btrfs fi usage -T, cause 1. It has all the information. 2. It differentiates between used and allocated. % btrfs fi usage -T / Overall: Device size: 100.00GiB Device allocated: 54.06GiB Device unallocated: 45.94GiB Device missing: 0.00B Used: 46.24GiB Free (estimated): 25.58GiB (min: 25.58GiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 70.91MiB (used: 0.00B) Data Metadata System Id Path RAID1RAID1 RAID1Unallocated -- - --- 2 /dev/mapper/msata-debian 25.00GiB 2.00GiB 32.00MiB22.97GiB 1 /dev/mapper/sata-debian 25.00GiB 2.00GiB 32.00MiB22.97GiB -- - --- Total25.00GiB 2.00GiB 32.00MiB45.94GiB Used 22.38GiB 754.66MiB 16.00KiB For RAID it in some place reports the raw size and sometimes the logical size. Especially in the "Total" line I find this a bit inconsistent. "RAID1" columns show logical size, "Unallocated" shows raw size. Also "Used:" in the global section shows raw size and "Free (estimated):" shows logical size. Thanks -- Martin
Re: BTRFS related kernel backtrace on boot on 4.18.7 after blackout due to discharged battery
Filipe Manana - 05.10.18, 17:21: > On Fri, Oct 5, 2018 at 3:23 PM Martin Steigerwald wrote: > > Hello! > > > > On ThinkPad T520 after battery was discharged and machine just > > blacked out. > > > > Is that some sign of regular consistency check / replay or something > > to investigate further? > > I think it's harmless, if anything were messed up with link counts or > mismatches between those and dir entries, fsck (btrfs check) should > have reported something. > I'll dig a big further and remove the warning if it's really harmless. I just scrubbed the filesystem. I did not run btrfs check on it. > > I already scrubbed all data and there are no errors. Also btrfs > > device stats reports no errors. SMART status appears to be okay as > > well on both SSD. > > > > [4.524355] BTRFS info (device dm-4): disk space caching is > > enabled [… backtrace …] -- Martin
BTRFS related kernel backtrace on boot on 4.18.7 after blackout due to discharged battery
Hello! On ThinkPad T520 after battery was discharged and machine just blacked out. Is that some sign of regular consistency check / replay or something to investigate further? I already scrubbed all data and there are no errors. Also btrfs device stats reports no errors. SMART status appears to be okay as well on both SSD. [4.524355] BTRFS info (device dm-4): disk space caching is enabled [4.524356] BTRFS info (device dm-4): has skinny extents [4.563950] BTRFS info (device dm-4): enabling ssd optimizations [5.463085] Console: switching to colour frame buffer device 240x67 [5.492236] i915 :00:02.0: fb0: inteldrmfb frame buffer device [5.882661] BTRFS info (device dm-3): disk space caching is enabled [5.882664] BTRFS info (device dm-3): has skinny extents [5.918579] SGI XFS with ACLs, security attributes, realtime, scrub, no debug enabled [5.927421] Adding 20971516k swap on /dev/mapper/sata-swap. Priority:-2 extents:1 across:20971516k SSDsc [5.935051] XFS (sdb1): Mounting V5 Filesystem [5.935218] XFS (sda1): Mounting V5 Filesystem [5.961100] XFS (sda1): Ending clean mount [5.970857] BTRFS info (device dm-3): enabling ssd optimizations [5.972358] XFS (sdb1): Ending clean mount [5.975955] WARNING: CPU: 1 PID: 1104 at fs/inode.c:342 inc_nlink+0x28/0x30 [5.978271] Modules linked in: xfs msr pktcdvd intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_hdmi pcbc arc4 snd_hda_codec_conexant snd_hda_codec_generic iwldvm mac80211 iwlwifi aesni_intel snd_hda_intel snd_hda_codec aes_x86_64 crypto_simd cryptd snd_hda_core glue_helper intel_cstate snd_hwdep intel_rapl_perf snd_pcm pcspkr input_leds i915 sg cfg80211 snd_timer thinkpad_acpi nvram drm_kms_helper snd soundcore tpm_tis tpm_tis_core drm rfkill ac tpm i2c_algo_bit fb_sys_fops battery rng_core video syscopyarea sysfillrect sysimgblt button evdev sbs sbshc coretemp bfq hdaps(O) tp_smapi(O) thinkpad_ec(O) loop ecryptfs cbc sunrpc mcryptd sha256_ssse3 sha256_generic encrypted_keys ip_tables x_tables autofs4 dm_mod [5.990499] btrfs xor zstd_decompress zstd_compress xxhash zlib_deflate raid6_pq libcrc32c crc32c_generic sr_mod cdrom sd_mod hid_lenovo hid_generic usbhid hid ahci libahci libata ehci_pci crc32c_intel psmouse i2c_i801 sdhci_pci cqhci lpc_ich sdhci ehci_hcd e1000e scsi_mod i2c_core mfd_core mmc_core usbcore usb_common thermal [5.990529] CPU: 1 PID: 1104 Comm: mount Tainted: G O 4.18.7-tp520 #63 [5.990532] Hardware name: LENOVO 42433WG/42433WG, BIOS 8AET69WW (1.49 ) 06/14/2018 [6.000153] RIP: 0010:inc_nlink+0x28/0x30 [6.000154] Code: 00 00 8b 47 48 85 c0 74 07 83 c0 01 89 47 48 c3 f6 87 a1 00 00 00 04 74 11 48 8b 47 28 f0 48 ff 88 98 04 00 00 8b 47 48 eb df <0f> 0b eb eb 0f 1f 40 00 41 54 8b 0d 70 3f aa 00 48 ba eb 83 b5 80 [6.008573] RSP: 0018:c90002283828 EFLAGS: 00010246 [6.008575] RAX: RBX: 8804018bed58 RCX: 00022261 [6.008576] RDX: 00022251 RSI: RDI: 8804018bed58 [6.008577] RBP: c90002283a50 R08: 0002a330 R09: a02f3873 [6.008578] R10: 30ff R11: 7763 R12: 0011 [6.008579] R13: 3d5f R14: 880403e19800 R15: 88040a3c69a0 [6.008580] FS: 7f071598f100() GS:88041e24() knlGS: [6.008581] CS: 0010 DS: ES: CR0: 80050033 [6.008589] CR2: 7fda4fbf8218 CR3: 000403e42001 CR4: 000606e0 [6.008590] Call Trace: [6.008614] replay_one_buffer+0x80e/0x890 [btrfs] [6.008632] walk_up_log_tree+0x1dc/0x260 [btrfs] [6.046858] walk_log_tree+0xaf/0x1e0 [btrfs] [6.046872] btrfs_recover_log_trees+0x21c/0x410 [btrfs] [6.046885] ? btree_read_extent_buffer_pages+0xcd/0x210 [btrfs] [6.055941] ? fixup_inode_link_counts+0x170/0x170 [btrfs] [6.055953] open_ctree+0x1a0d/0x1b60 [btrfs] [6.055965] btrfs_mount_root+0x67b/0x760 [btrfs] [6.065039] ? pcpu_alloc_area+0xdd/0x120 [6.065040] ? pcpu_next_unpop+0x32/0x40 [6.065052] mount_fs+0x36/0x162 [6.065055] vfs_kern_mount.part.34+0x4f/0x120 [6.065064] btrfs_mount+0x15f/0x890 [btrfs] [6.065067] ? pcpu_cnt_pop_pages+0x40/0x50 [6.065069] ? pcpu_alloc_area+0xdd/0x120 [6.065071] ? pcpu_next_unpop+0x32/0x40 [6.065073] ? cpumask_next+0x16/0x20 [6.065075] ? pcpu_alloc+0x1c3/0x690 [6.065078] ? mount_fs+0x36/0x162 [6.099840] mount_fs+0x36/0x162 [6.099843] vfs_kern_mount.part.34+0x4f/0x120 [6.099845] do_mount+0x1f7/0xc80 [6.099856] ksys_mount+0xb5/0xd0 [6.099859] __x64_sys_mount+0x1c/0x20 [6.099861] do_syscall_64+0x43/0xd0 [6.099864] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [6.099866] RIP: 0033:0x7f0715b89a1a [6.099867] Code: 48 8b 0d 71 e4 0b 00 f7 d8 64 89 01 48 83 c8
Re: very poor performance / a lot of writes to disk with space_cache (but not with space_cache=v2)
Hans van Kranenburg - 19.09.18, 19:58: > However, as soon as we remount the filesystem with space_cache=v2 - > > > writes drop to just around 3-10 MB/s to each disk. If we remount to > > space_cache - lots of writes, system unresponsive. Again remount to > > space_cache=v2 - low writes, system responsive. > > > > That's a huuge, 10x overhead! Is it expected? Especially that > > space_cache=v1 is still the default mount option? > > Yes, that does not surprise me. > > https://events.static.linuxfound.org/sites/events/files/slides/vault20 > 16_0.pdf > > Free space cache v1 is the default because of issues with btrfs-progs, > not because it's unwise to use the kernel code. I can totally > recommend using it. The linked presentation above gives some good > background information. What issues in btrfs-progs are that? I am wondering whether to switch to freespace tree v2. Would it provide benefit for a regular / and /home filesystems as dual SSD BTRFS RAID-1 on a laptop? Thanks, -- Martin
Re: lazytime mount option—no support in Btrfs
waxhead - 18.08.18, 22:45: > Adam Hunt wrote: > > Back in 2014 Ted Tso introduced the lazytime mount option for ext4 > > and shortly thereafter a more generic VFS implementation which was > > then merged into mainline. His early patches included support for > > Btrfs but those changes were removed prior to the feature being > > merged. His> > > changelog includes the following note about the removal: > >- Per Christoph's suggestion, drop support for btrfs and xfs for > >now, > > > > issues with how btrfs and xfs handle dirty inode tracking. We > > can add btrfs and xfs support back later or at the end of this > > series if we want to revisit this decision. > > > > My reading of the current mainline shows that Btrfs still lacks any > > support for lazytime. Has any thought been given to adding support > > for lazytime to Btrfs? […] > Is there any new regarding this? I´d like to know whether there is any news about this as well. If I understand it correctly this could even help BTRFS performance a lot cause it is COW´ing metadata. Thanks, -- Martin
Re: Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD
Roman Mamedov - 18.08.18, 09:12: > On Fri, 17 Aug 2018 23:17:33 +0200 > > Martin Steigerwald wrote: > > > Do not consider SSD "compression" as a factor in any of your > > > calculations or planning. Modern controllers do not do it anymore, > > > the last ones that did are SandForce, and that's 2010 era stuff. > > > You > > > can check for yourself by comparing write speeds of compressible > > > vs > > > incompressible data, it should be the same. At most, the modern > > > ones > > > know to recognize a stream of binary zeroes and have a special > > > case > > > for that. > > > > Interesting. Do you have any backup for your claim? > > Just "something I read". I follow quote a bit of SSD-related articles > and reviews which often also include a section to talk about the > controller utilized, its background and technological > improvements/changes -- and the compression going out of fashion > after SandForce seems to be considered a well-known fact. > > Incidentally, your old Intel 320 SSDs actually seem to be based on > that old SandForce controller (or at least license some of that IP to > extend on it), and hence those indeed might perform compression. Interesting. Back then I read the Intel SSD 320 would not compress. I think its difficult to know for sure with those proprietary controllers. > > As the data still needs to be transferred to the SSD at least when > > the SATA connection is maxed out I bet you won´t see any difference > > in write speed whether the SSD compresses in real time or not. > > Most controllers expose two readings in SMART: > > - Lifetime writes from host (SMART attribute 241) > - Lifetime writes to flash (attribute 233, or 177, or 173...) > > It might be difficult to get the second one, as often it needs to be > decoded from others such as "Average block erase count" or "Wear > leveling count". (And seems to be impossible on Samsung NVMe ones, > for example) I got the impression every manufacturer does their own thing here. And I would not even be surprised when its different between different generations of SSDs by one manufacturer. # Crucial mSATA SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 000Pre-fail Always - 0 5 Reallocated_Sector_Ct 0x0033 100 100 000Pre-fail Always - 0 9 Power_On_Hours 0x0032 100 100 000Old_age Always - 16345 12 Power_Cycle_Count 0x0032 100 100 000Old_age Always - 4193 171 Program_Fail_Count 0x0032 100 100 000Old_age Always - 0 172 Erase_Fail_Count0x0032 100 100 000Old_age Always - 0 173 Wear_Leveling_Count 0x0032 078 078 000Old_age Always - 663 174 Unexpect_Power_Loss_Ct 0x0032 100 100 000Old_age Always - 362 180 Unused_Rsvd_Blk_Cnt_Tot 0x0033 000 000 000Pre-fail Always - 8219 183 SATA_Iface_Downshift0x0032 100 100 000Old_age Always - 1 184 End-to-End_Error0x0032 100 100 000Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000Old_age Always - 0 194 Temperature_Celsius 0x0022 046 020 000Old_age Always - 54 (Min/Max -10/80) 196 Reallocated_Event_Count 0x0032 100 100 000Old_age Always - 16 197 Current_Pending_Sector 0x0032 100 100 000Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 100 000Old_age Offline - 0 199 UDMA_CRC_Error_Count0x0032 100 100 000Old_age Always - 0 202 Percent_Lifetime_Used 0x0031 078 078 000Pre-fail Offline - 22 I expect the raw value of this to raise more slowly now there are almost 100 GiB completely unused and there is lots of free space in the filesystems. But even if not, the SSD is in use since March 2014. So it has plenty of time to go. 206 Write_Error_Rate0x000e 100 100 000Old_age Always - 0 210 Success_RAIN_Recov_Cnt 0x0032 100 100 000Old_age Always - 0 246 Total_Host_Sector_Write 0x0032 100 100 ---Old_age Always - 91288276930 ^^ In sectors. 91288276930 * 512 / 1024 / 1024 / 1024 ~= 43529 GiB Could be 4 KiB… but as its telling about Host_Sector and the value multiplied by eight does not make any sense, I bet its 512 Bytes. % smartctl /dev/sdb --all |grep "
Re: Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD
Austin S. Hemmelgarn - 17.08.18, 14:55: > On 2018-08-17 08:28, Martin Steigerwald wrote: > > Thanks for your detailed answer. > > > > Austin S. Hemmelgarn - 17.08.18, 13:58: > >> On 2018-08-17 05:08, Martin Steigerwald wrote: […] > >>> Anyway, creating a new filesystem may have been better here > >>> anyway, > >>> cause it replaced an BTRFS that aged over several years with a new > >>> one. Due to the increased capacity and due to me thinking that > >>> Samsung 860 Pro compresses itself, I removed LZO compression. This > >>> would also give larger extents on files that are not fragmented or > >>> only slightly fragmented. I think that Intel SSD 320 did not > >>> compress, but Crucial m500 mSATA SSD does. That has been the > >>> secondary SSD that still had all the data after the outage of the > >>> Intel SSD 320. > >> > >> First off, keep in mind that the SSD firmware doing compression > >> only > >> really helps with wear-leveling. Doing it in the filesystem will > >> help not only with that, but will also give you more space to work > >> with.> > > While also reducing the ability of the SSD to wear-level. The more > > data I fit on the SSD, the less it can wear-level. And the better I > > compress that data, the less it can wear-level. > > No, the better you compress the data, the _less_ data you are > physically putting on the SSD, just like compressing a file makes it > take up less space. This actually makes it easier for the firmware > to do wear-leveling. Wear-leveling is entirely about picking where > to put data, and by reducing the total amount of data you are writing > to the SSD, you're making that decision easier for the firmware, and > also reducing the number of blocks of flash memory needed (which also > helps with SSD life expectancy because it translates to fewer erase > cycles). On one hand I can go with this, but: If I fill the SSD 99% with already compressed data, in case it compresses itself for wear leveling, it has less chance to wear level than with 99% of not yet compressed data that it could compress itself. That was the point I was trying to make. Sure, with a fill rate of about 46% for home, compression would help the wear leveling. And if the controller does not compress at all, it would also. Hmmm, maybe I enable "zstd", but on the other hand I save CPU cycles with not enabling it. > > However… I am not all that convinced that it would benefit me as > > long as I have enough space. That SSD replacement more than doubled > > capacity from about 680 TB to 1480 TB. I have ton of free space in > > the filesystems – usage of /home is only 46% for example – and > > there are 96 GiB completely unused in LVM on the Crucial SSD and > > even more than 183 GiB completely unused on Samsung SSD. The system > > is doing weekly "fstrim" on all filesystems. I think that this is > > more than is needed for the longevity of the SSDs, but well > > actually I just don´t need the space, so… > > > > Of course, in case I manage to fill up all that space, I consider > > using compression. Until then, I am not all that convinced that I´d > > benefit from it. > > > > Of course it may increase read speeds and in case of nicely > > compressible data also write speeds, I am not sure whether it even > > matters. Also it uses up some CPU cycles on a dual core (+ > > hyperthreading) Sandybridge mobile i5. While I am not sure about > > it, I bet also having larger possible extent sizes may help a bit. > > As well as no compression may also help a bit with fragmentation. > > It generally does actually. Less data physically on the device means > lower chances of fragmentation. In your case, it may not improve I thought "no compression" may help with fragmentation, but I think you think that "compression" helps with fragmentation and misunderstood what I wrote. > speed much though (your i5 _probably_ can't compress data much faster > than it can access your SSD's, which means you likely won't see much > performance benefit other than reducing fragmentation). > > > Well putting this to a (non-scientific) test: > > > > […]/.local/share/akonadi/db_data/akonadi> du -sh * | sort -rh | head > > -5 3,1Gparttable.ibd > > > > […]/.local/share/akonadi/db_data/akonadi> filefrag parttable.ibd > > parttable.ibd: 11583 extents found > > > > Hmmm, already quite many extents after just about one week with the > > new filesystem. On the old filesystem I had somewhat around > > 4-5 ex
Re: Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD
Hi Roman. Now with proper CC. Roman Mamedov - 17.08.18, 14:50: > On Fri, 17 Aug 2018 14:28:25 +0200 > > Martin Steigerwald wrote: > > > First off, keep in mind that the SSD firmware doing compression > > > only > > > really helps with wear-leveling. Doing it in the filesystem will > > > help not only with that, but will also give you more space to > > > work with.> > > While also reducing the ability of the SSD to wear-level. The more > > data I fit on the SSD, the less it can wear-level. And the better I > > compress that data, the less it can wear-level. > > Do not consider SSD "compression" as a factor in any of your > calculations or planning. Modern controllers do not do it anymore, > the last ones that did are SandForce, and that's 2010 era stuff. You > can check for yourself by comparing write speeds of compressible vs > incompressible data, it should be the same. At most, the modern ones > know to recognize a stream of binary zeroes and have a special case > for that. Interesting. Do you have any backup for your claim? > As for general comment on this thread, always try to save the exact > messages you get when troubleshooting or getting failures from your > system. Saying just "was not able to add" or "btrfs replace not > working" without any exact details isn't really helpful as a bug > report or even as a general "experiences" story, as we don't know > what was the exact cause of those, could that have been avoided or > worked around, not to mention what was your FS state at the time (as > in "btrfs fi show" and "fi df"). I had a screen.log, but I put it on the filesystem after the backup was made, so it was lost. Anyway, the reason for not being able to add the device was the read only state of the BTRFS, as I wrote. Same goes for replace. I was able to read the error message just fine. AFAIR the exact wording was "read only filesystem". In any case: It was a experience report, no request for help, so I don´t see why exact error messages are absolutely needed. If I had a support inquiry that would be different, I agree. Thanks, -- Martin
Re: Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD
Austin S. Hemmelgarn - 17.08.18, 15:01: > On 2018-08-17 08:50, Roman Mamedov wrote: > > On Fri, 17 Aug 2018 14:28:25 +0200 > > > > Martin Steigerwald wrote: > >>> First off, keep in mind that the SSD firmware doing compression > >>> only > >>> really helps with wear-leveling. Doing it in the filesystem will > >>> help not only with that, but will also give you more space to > >>> work with.>> > >> While also reducing the ability of the SSD to wear-level. The more > >> data I fit on the SSD, the less it can wear-level. And the better > >> I compress that data, the less it can wear-level. > > > > Do not consider SSD "compression" as a factor in any of your > > calculations or planning. Modern controllers do not do it anymore, > > the last ones that did are SandForce, and that's 2010 era stuff. > > You can check for yourself by comparing write speeds of > > compressible vs incompressible data, it should be the same. At > > most, the modern ones know to recognize a stream of binary zeroes > > and have a special case for that. > > All that testing write speeds forz compressible versus incompressible > data tells you is if the SSD is doing real-time compression of data, > not if they are doing any compression at all.. Also, this test only > works if you turn the write-cache on the device off. As the data still needs to be transferred to the SSD at least when the SATA connection is maxed out I bet you won´t see any difference in write speed whether the SSD compresses in real time or not. > Besides, you can't prove 100% for certain that any manufacturer who > does not sell their controller chips isn't doing this, which means > there are a few manufacturers that may still be doing it. Who really knows what SSD controller manufacturers are doing? I have not seen any Open Channel SSD stuff for laptops so far. Thanks, -- Martin
Re: Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD
Thanks for your detailed answer. Austin S. Hemmelgarn - 17.08.18, 13:58: > On 2018-08-17 05:08, Martin Steigerwald wrote: […] > > I have seen a discussion about the limitation in point 2. That > > allowing to add a device and make it into RAID 1 again might be > > dangerous, cause of system chunk and probably other reasons. I did > > not completely read and understand it tough. > > > > So I still don´t get it, cause: > > > > Either it is a RAID 1, then, one disk may fail and I still have > > *all* > > data. Also for the system chunk, which according to btrfs fi df / > > btrfs fi sh was indeed RAID 1. If so, then period. Then I don´t see > > why it would need to disallow me to make it into an RAID 1 again > > after one device has been lost. > > > > Or it is no RAID 1 and then what is the point to begin with? As I > > was > > able to copy of all date of the degraded mount, I´d say it was a > > RAID 1. > > > > (I know that BTRFS RAID 1 is not a regular RAID 1 anyway, but just > > does two copies regardless of how many drives you use.) > > So, what's happening here is a bit complicated. The issue is entirely > with older kernels that are missing a couple of specific patches, but > it appears that not all distributions have their kernels updated to > include those patches yet. > > In short, when you have a volume consisting of _exactly_ two devices > using raid1 profiles that is missing one device, and you mount it > writable and degraded on such a kernel, newly created chunks will be > single-profile chunks instead of raid1 chunks with one half missing. > Any write has the potential to trigger allocation of a new chunk, and > more importantly any _read_ has the potential to trigger allocation of > a new chunk if you don't use the `noatime` mount option (because a > read will trigger an atime update, which results in a write). > > When older kernels then go and try to mount that volume a second time, > they see that there are single-profile chunks (which can't tolerate > _any_ device failures), and refuse to mount at all (because they > can't guarantee that metadata is intact). Newer kernels fix this > part by checking per-chunk if a chunk is degraded/complete/missing, > which avoids this because all the single chunks are on the remaining > device. How new the kernel needs to be for that to happen? Do I get this right that it would be the kernel used for recovery, i.e. the one on the live distro that needs to be new enough? To one on this laptop meanwhile is already 4.18.1. I used latest GRML stable release 2017.05 which has an 4.9 kernel. > As far as avoiding this in the future: I hope that with the new Samsung Pro 860 together with the existing Crucial m500 I am spared from this for years to come. That Crucial SSD according to SMART status about lifetime used has still quite some time to go. > * If you're just pulling data off the device, mark the device > read-only in the _block layer_, not the filesystem, before you mount > it. If you're using LVM, just mark the LV read-only using LVM > commands This will make 100% certain that nothing gets written to > the device, and thus makes sure that you won't accidentally cause > issues like this. > * If you're going to convert to a single device, > just do it and don't stop it part way through. In particular, make > sure that your system will not lose power. > * Otherwise, don't mount the volume unless you know you're going to > repair it. Thanks for those. Good to keep in mind. > > For this laptop it was not all that important but I wonder about > > BTRFS RAID 1 in enterprise environment, cause restoring from backup > > adds a significantly higher downtime. > > > > Anyway, creating a new filesystem may have been better here anyway, > > cause it replaced an BTRFS that aged over several years with a new > > one. Due to the increased capacity and due to me thinking that > > Samsung 860 Pro compresses itself, I removed LZO compression. This > > would also give larger extents on files that are not fragmented or > > only slightly fragmented. I think that Intel SSD 320 did not > > compress, but Crucial m500 mSATA SSD does. That has been the > > secondary SSD that still had all the data after the outage of the > > Intel SSD 320. > > First off, keep in mind that the SSD firmware doing compression only > really helps with wear-leveling. Doing it in the filesystem will help > not only with that, but will also give you more space to work with. While also reducing the ability of the SSD to wear-level. The more data I fit on the SSD, the less it can wear-level. And the better I compress that data, the less it can wear-level
Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD
Hi! This happened about two weeks ago. I already dealt with it and all is well. Linux hung on suspend so I switched off this ThinkPad T520 forcefully. After that it did not boot the operating system anymore. Intel SSD 320, latest firmware, which should patch this bug, but apparently does not, is only 8 MiB big. Those 8 MiB just contain zeros. Access via GRML and "mount -fo degraded" worked. I initially was even able to write onto this degraded filesystem. First I copied all data to a backup drive. I even started a balance to "single" so that it would work with one SSD. But later I learned that secure erase may recover the Intel SSD 320 and since I had no other SSD at hand, did that. And yes, it did. So I canceled the balance. I partitioned the Intel SSD 320 and put LVM on it, just as I had it. But at that time I was not able to mount the degraded BTRFS on the other SSD as writable anymore, not even with "-f" "I know what I am doing". Thus I was not able to add a device to it and btrfs balance it to RAID 1. Even "btrfs replace" was not working. I thus formatted a new BTRFS RAID 1 and restored. A week later I migrated the Intel SSD 320 to a Samsung 860 Pro. Again via one full backup and restore cycle. However, this time I was able to copy most of the data of the Intel SSD 320 with "mount -fo degraded" via eSATA and thus the copy operation was way faster. So conclusion: 1. Pro: BTRFS RAID 1 really protected my data against a complete SSD outage. 2. Con: It does not allow me to add a device and balance to RAID 1 or replace one device that is already missing at this time. 3. I keep using BTRFS RAID 1 on two SSDs for often changed, critical data. 4. And yes, I know it does not replace a backup. As it was holidays and I was lazy backup was two weeks old already, so I was happy to have all my data still on the other SSD. 5. The error messages in kernel when mounting without "-o degraded" are less than helpful. They indicate a corrupted filesystem instead of just telling that one device is missing and "-o degraded" would help here. I have seen a discussion about the limitation in point 2. That allowing to add a device and make it into RAID 1 again might be dangerous, cause of system chunk and probably other reasons. I did not completely read and understand it tough. So I still don´t get it, cause: Either it is a RAID 1, then, one disk may fail and I still have *all* data. Also for the system chunk, which according to btrfs fi df / btrfs fi sh was indeed RAID 1. If so, then period. Then I don´t see why it would need to disallow me to make it into an RAID 1 again after one device has been lost. Or it is no RAID 1 and then what is the point to begin with? As I was able to copy of all date of the degraded mount, I´d say it was a RAID 1. (I know that BTRFS RAID 1 is not a regular RAID 1 anyway, but just does two copies regardless of how many drives you use.) For this laptop it was not all that important but I wonder about BTRFS RAID 1 in enterprise environment, cause restoring from backup adds a significantly higher downtime. Anyway, creating a new filesystem may have been better here anyway, cause it replaced an BTRFS that aged over several years with a new one. Due to the increased capacity and due to me thinking that Samsung 860 Pro compresses itself, I removed LZO compression. This would also give larger extents on files that are not fragmented or only slightly fragmented. I think that Intel SSD 320 did not compress, but Crucial m500 mSATA SSD does. That has been the secondary SSD that still had all the data after the outage of the Intel SSD 320. Overall I am happy, cause BTRFS RAID 1 gave me access to the data after the SSD outage. That is the most important thing about it for me. Thanks, -- Martin
Re: BTRFS and databases
Andrei Borzenkov - 02.08.18, 12:35: > Отправлено с iPhone > > > 2 авг. 2018 г., в 12:16, Martin Steigerwald > > написал(а):> > > Hugo Mills - 01.08.18, 10:56: > >>> On Wed, Aug 01, 2018 at 05:45:15AM +0200, MegaBrutal wrote: > >>> I know it's a decade-old question, but I'd like to hear your > >>> thoughts > >>> of today. By now, I became a heavy BTRFS user. Almost everywhere I > >>> use BTRFS, except in situations when it is obvious there is no > >>> benefit (e.g. /var/log, /boot). At home, all my desktop, laptop > >>> and > >>> server computers are mainly running on BTRFS with only a few file > >>> systems on ext4. I even installed BTRFS in corporate productive > >>> systems (in those cases, the systems were mainly on ext4; but > >>> there > >>> were some specific file systems those exploited BTRFS features). > >>> > >>> But there is still one question that I can't get over: if you > >>> store > >>> a > >>> database (e.g. MySQL), would you prefer having a BTRFS volume > >>> mounted > >>> with nodatacow, or would you just simply use ext4? > >>> > >> Personally, I'd start with btrfs with autodefrag. It has some > >> > >> degree of I/O overhead, but if the database isn't > >> performance-critical and already near the limits of the hardware, > >> it's unlikely to make much difference. Autodefrag should keep the > >> fragmentation down to a minimum. > > > > I read that autodefrag would only help with small databases. > > I wonder if anyone actually > > a) quantified performance impact > b) analyzed the cause > > I work with NetApp for a long time and I can say from first hand > experience that fragmentation had zero impact on OLTP workload. It > did affect backup performance as was expected, but this could be > fixed by periodic reallocation (defragmentation). > > And even that needed quite some time to observe (years) on pretty high > load database with regular backup and replication snapshots. > > If btrfs is so susceptible to fragmentation, what is the reason for > it? In the end of my original mail I mentioned a blog article that also had some performance graphs. Did you actually read it? Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS and databases
Hugo Mills - 01.08.18, 10:56: > On Wed, Aug 01, 2018 at 05:45:15AM +0200, MegaBrutal wrote: > > I know it's a decade-old question, but I'd like to hear your > > thoughts > > of today. By now, I became a heavy BTRFS user. Almost everywhere I > > use BTRFS, except in situations when it is obvious there is no > > benefit (e.g. /var/log, /boot). At home, all my desktop, laptop and > > server computers are mainly running on BTRFS with only a few file > > systems on ext4. I even installed BTRFS in corporate productive > > systems (in those cases, the systems were mainly on ext4; but there > > were some specific file systems those exploited BTRFS features). > > > > But there is still one question that I can't get over: if you store > > a > > database (e.g. MySQL), would you prefer having a BTRFS volume > > mounted > > with nodatacow, or would you just simply use ext4? > >Personally, I'd start with btrfs with autodefrag. It has some > degree of I/O overhead, but if the database isn't performance-critical > and already near the limits of the hardware, it's unlikely to make > much difference. Autodefrag should keep the fragmentation down to a > minimum. I read that autodefrag would only help with small databases. I also read that even on SSDs there is a notable performance penalty. 4.2 GiB akonadi database for tons of mails appears to work okayish on dual SSD BTRFS RAID 1 here with LZO compression here. However I have no comparison, for example how it would run on XFS. And its fragmented quite a bit, example for the largest file of 3 GiB – I know this in part is also due to LZO compression. […].local/share/akonadi/db_data/akonadi> time /usr/sbin/filefrag parttable.ibd parttable.ibd: 45380 extents found /usr/sbin/filefrag parttable.ibd 0,00s user 0,86s system 41% cpu 2,054 total However it digs out those extents quite fast. I would not feel comfortable with setting this file to nodatacow. However I wonder: Is this it? Is there nothing that can be improved in BTRFS to handle database and VM files in a better way, without altering any default settings? Is it also an issue on ZFS? ZFS does also copy on write. How does ZFS handle this? Can anything be learned from it? I never head people complain about poor database performance on ZFS, but… I don´t use it and I am not subscribed to any ZFS mailing lists, so they may have similar issues and I just do not know it. Well there seems to be a performance penalty at least when compared to XFS: About ZFS Performance Yves Trudeau, May 15, 2018 https://www.percona.com/blog/2018/05/15/about-zfs-performance/ The article described how you can use NVMe devices as cache to mitigate the performance impact. That would hint that BTRFS with VFS Hot Data Tracking and relocating data to SSD or NVMe devices could be a way to set this up. But as said I read about bad database performance even on SSDs with BTRFS. I do not find the original reference at the moment, but I got this for example, however it is from 2015 (on kernel 4.0 which is a bit old): Friends don't let friends use BTRFS for OLTP 2015/09/16 by Tomas Vondra https://blog.pgaddict.com/posts/friends-dont-let-friends-use-btrfs-for-oltp Interestingly it also compares with ZFS which is doing much better. So maybe there is really something to be learned from ZFS. I did not get clearly whether the benchmark was on an SSD, as Tomas notes the "ssd" mount option, it might have been. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Healthy amount of free space?
Nikolay Borisov - 17.07.18, 10:16: > On 17.07.2018 11:02, Martin Steigerwald wrote: > > Nikolay Borisov - 17.07.18, 09:20: > >> On 16.07.2018 23:58, Wolf wrote: > >>> Greetings, > >>> I would like to ask what what is healthy amount of free space to > >>> keep on each device for btrfs to be happy? > >>> > >>> This is how my disk array currently looks like > >>> > >>> [root@dennas ~]# btrfs fi usage /raid > >>> > >>> Overall: > >>> Device size: 29.11TiB > >>> Device allocated: 21.26TiB > >>> Device unallocated:7.85TiB > >>> Device missing: 0.00B > >>> Used: 21.18TiB > >>> Free (estimated): 3.96TiB (min: 3.96TiB) > >>> Data ratio: 2.00 > >>> Metadata ratio: 2.00 > >>> Global reserve: 512.00MiB (used: 0.00B) > > > > […] > > > >>> Btrfs does quite good job of evenly using space on all devices. > >>> No, > >>> how low can I let that go? In other words, with how much space > >>> free/unallocated remaining space should I consider adding new > >>> disk? > >> > >> Btrfs will start running into problems when you run out of > >> unallocated space. So the best advice will be monitor your device > >> unallocated, once it gets really low - like 2-3 gb I will suggest > >> you run balance which will try to free up unallocated space by > >> rewriting data more compactly into sparsely populated block > >> groups. If after running balance you haven't really freed any > >> space then you should consider adding a new drive and running > >> balance to even out the spread of data/metadata. > > > > What are these issues exactly? > > For example if you have plenty of data space but your metadata is full > then you will be getting ENOSPC. Of that one I am aware. This just did not happen so far. I did not yet add it explicitly to the training slides, but I just make myself a note to do that. Anything else? > > I have > > > > % btrfs fi us -T /home > > > > Overall: > > Device size: 340.00GiB > > Device allocated:340.00GiB > > Device unallocated:2.00MiB > > Device missing: 0.00B > > Used:308.37GiB > > Free (estimated): 14.65GiB (min: 14.65GiB) > > Data ratio: 2.00 > > Metadata ratio: 2.00 > > Global reserve: 512.00MiB (used: 0.00B) > > > > Data Metadata System > > > > Id Path RAID1 RAID1RAID1Unallocated > > -- -- - --- > > > > 1 /dev/mapper/msata-home 165.89GiB 4.08GiB 32.00MiB 1.00MiB > > 2 /dev/mapper/sata-home 165.89GiB 4.08GiB 32.00MiB 1.00MiB > > > > -- -- - --- > > > >Total 165.89GiB 4.08GiB 32.00MiB 2.00MiB > >Used 151.24GiB 2.95GiB 48.00KiB > > You already have only 33% of your metadata full so if your workload > turned out to actually be making more metadata-heavy changed i.e > snapshots you could exhaust this and get ENOSPC, despite having around > 14gb of free data space. Furthermore this data space is spread around > multiple data chunks, depending on how populated they are a balance > could be able to free up unallocated space which later could be > re-purposed for metadata (again, depending on what you are doing). The filesystem above IMO is not fit for snapshots. It would fill up rather quickly, I think even when I balance metadata. Actually I tried this and as I remember it took at most a day until it was full. If I read above figures currently at maximum I could gain one additional GiB by balancing metadata. That would not make a huge difference. I bet I am already running this filesystem beyond recommendation, as I bet many would argue it is to full already for regular usage… I do not see the benefit of squeezing the last free space out of it just to fit in another GiB. So I still do not get the point why it would make sense to balance it at this point in time. Especially as this 1 GiB I could regain is not even needed. And I do not see th
Re: Healthy amount of free space?
Hi Nikolay. Nikolay Borisov - 17.07.18, 09:20: > On 16.07.2018 23:58, Wolf wrote: > > Greetings, > > I would like to ask what what is healthy amount of free space to > > keep on each device for btrfs to be happy? > > > > This is how my disk array currently looks like > > > > [root@dennas ~]# btrfs fi usage /raid > > > > Overall: > > Device size: 29.11TiB > > Device allocated: 21.26TiB > > Device unallocated:7.85TiB > > Device missing: 0.00B > > Used: 21.18TiB > > Free (estimated): 3.96TiB (min: 3.96TiB) > > Data ratio: 2.00 > > Metadata ratio: 2.00 > > Global reserve: 512.00MiB (used: 0.00B) […] > > Btrfs does quite good job of evenly using space on all devices. No, > > how low can I let that go? In other words, with how much space > > free/unallocated remaining space should I consider adding new disk? > > Btrfs will start running into problems when you run out of unallocated > space. So the best advice will be monitor your device unallocated, > once it gets really low - like 2-3 gb I will suggest you run balance > which will try to free up unallocated space by rewriting data more > compactly into sparsely populated block groups. If after running > balance you haven't really freed any space then you should consider > adding a new drive and running balance to even out the spread of > data/metadata. What are these issues exactly? I have % btrfs fi us -T /home Overall: Device size: 340.00GiB Device allocated:340.00GiB Device unallocated:2.00MiB Device missing: 0.00B Used:308.37GiB Free (estimated): 14.65GiB (min: 14.65GiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data Metadata System Id Path RAID1 RAID1RAID1Unallocated -- -- - --- 1 /dev/mapper/msata-home 165.89GiB 4.08GiB 32.00MiB 1.00MiB 2 /dev/mapper/sata-home 165.89GiB 4.08GiB 32.00MiB 1.00MiB -- -- - --- Total 165.89GiB 4.08GiB 32.00MiB 2.00MiB Used 151.24GiB 2.95GiB 48.00KiB on a RAID-1 filesystem one, part of the time two Plasma desktops + KDEPIM and Akonadi + Baloo desktop search + you name it write to like mad. Since kernel 4.5 or 4.6 this simply works. Before that sometimes BTRFS crawled to an halt on searching for free blocks, and I had to switch off the laptop uncleanly. If that happened, a balance helped for a while. But since 4.5 or 4.6 this did not happen anymore. I found with SLES 12 SP 3 or so there is btrfsmaintenance running a balance weekly. Which created an issue on our Proxmox + Ceph on Intel NUC based opensource demo lab. This is for sure no recommended configuration for Ceph and Ceph is quite slow on these 2,5 inch harddisks and 1 GBit network link, despite albeit somewhat minimal, limited to 5 GiB m.2 SSD caching. What happened it that the VM crawled to a halt and the kernel gave task hung for more than 120 seconds messages. The VM was basically unusable during the balance. Sure that should not happen with a "proper" setup, also it also did not happen without the automatic balance. Also what would happen on a hypervisor setup with several thousands of VMs with BTRFS, when several 100 of them decide to start the balance at a similar time? It could probably bring the I/O system below to an halt, as many enterprise storage systems are designed to sustain burst I/O loads, but not maximum utilization during an extended period of time. I am really wondering what to recommend in my Linux performance tuning and analysis courses. On my own laptop I do not do regular balances so far. Due to my thinking: If it is not broken, do not fix it. My personal opinion here also is: If the filesystem degrades that much that it becomes unusable without regular maintenance from user space, the filesystem needs to be fixed. Ideally I would not have to worry on whether to regularly balance an BTRFS or not. In other words: I should not have to visit a performance analysis and tuning course in order to use a computer with BTRFS filesystem. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] btrfs: Check each block group has corresponding chunk at mount time
Nikolay Borisov - 03.07.18, 11:08: > On 3.07.2018 11:47, Qu Wenruo wrote: > > On 2018年07月03日 16:33, Nikolay Borisov wrote: > >> On 3.07.2018 11:08, Qu Wenruo wrote: > >>> Reported in https://bugzilla.kernel.org/show_bug.cgi?id=199837, if > >>> a > >>> crafted btrfs with incorrect chunk<->block group mapping, it could > >>> leads to a lot of unexpected behavior. > >>> > >>> Although the crafted image can be catched by block group item > >>> checker > >>> added in "[PATCH] btrfs: tree-checker: Verify block_group_item", > >>> if one crafted a valid enough block group item which can pass > >>> above check but still mismatch with existing chunk, it could > >>> cause a lot of undefined behavior. > >>> > >>> This patch will add extra block group -> chunk mapping check, to > >>> ensure we have a completely matching (start, len, flags) chunk > >>> for each block group at mount time. > >>> > >>> Reported-by: Xu Wen > >>> Signed-off-by: Qu Wenruo > >>> --- > >>> changelog: > >>> > >>> v2: > >>> Add better error message for each mismatch case. > >>> Rename function name, to co-operate with later patch. > >>> Add flags mismatch check. > >>> > >>> --- > >> > >> It's getting really hard to keep track of the various validation > >> patches you sent with multiple versions + new checks. Please batch > >> everything in a topic series i.e "Making checks stricter" or some > >> such and send everything again nicely packed, otherwise the risk > >> of mis-merging is increased. > > > > Indeed, I'll send the branch and push it to github. > > > >> I now see that Gu Jinxiang from fujitsu also started sending > >> validation fixes. > > > > No need to worry, that will be the only patch related to that thread > > of bugzilla from Fujitsu. > > As all the other cases can be addressed by my patches, sorry Fujitsu > > guys :)> > >> Also for evry patch which fixes a specific issue from one of the > >> reported on bugzilla.kernel.org just use the Link: tag to point to > >> the original report on bugzilla that will make it easier to relate > >> the fixes to the original report. > > > > Never heard of "Link:" tag. > > Maybe it's a good idea to added it to "submitting-patches.rst"? > > I guess it's not officially documented but if you do git log --grep > "Link:" you'd see quite a lot of patches actually have a Link pointing > to the original thread if it has sparked some pertinent discussion. > In this case those patches are a direct result of a bugzilla > bugreport so having a Link: tag makes sense. For Bugzilla reports I saw something like Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=43511 in a patch I was Cc´d to. Of course that does only apply if the patch in question fixes the reported bug. > In the example of the qgroup patch I sent yesterday resulting from > Misono's report there was also an involved discussion hence I added a > link to the original thread. […] -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass
Hey James. james harvey - 12.05.18, 07:08: > 100% reproducible, booting from disk, or even Arch installation ISO. > Kernel 4.16.7. btrfs-progs v4.16. > > Reading one of two journalctl files causes a kernel oops. Initially > ran into it from "journalctl --list-boots", but cat'ing the file does > it too. I believe this shows there's compressed data that is invalid, > but its btrfs checksum is invalid. I've cat'ed every file on the > disk, and luckily have the problems narrowed down to only these 2 > files in /var/log/journal. > > This volume has always been mounted with lzo compression. > > scrub has never found anything, and have ran it since the oops. > > Found a user a few years ago who also ran into this, without > resolution, at: > https://www.spinics.net/lists/linux-btrfs/msg52218.html > > 1. Cat'ing a (non-essential) file shouldn't be able to bring down the > system. > > 2. If this is infact invalid compressed data, there should be a way to > check for that. Btrfs check and scrub pass. I think systemd-journald sets those files to nocow on BTRFS in order to reduce fragmentation: That means no checksums, no snapshots, no nothing. I just removed /var/log/journal and thus disabled journalling to disk. Its sufficient for me to have the recent state in /run/journal. Can you confirm nocow being set via lsattr on those files? Still they should be decompressible just fine. > Hardware is fine. Passes memtest86+ in SMP mode. Works fine on all > other files. > > > > [ 381.869940] BUG: unable to handle kernel paging request at > 00390e50 [ 381.870881] BTRFS: decompress failed […] -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Read before you deploy btrfs + zstd
David Sterba - 15.11.17, 15:39: > On Tue, Nov 14, 2017 at 07:53:31PM +0100, David Sterba wrote: > > On Mon, Nov 13, 2017 at 11:50:46PM +0100, David Sterba wrote: > > > Up to now, there are no bootloaders supporting ZSTD. > > > > I've tried to implement the support to GRUB, still incomplete and hacky > > but most of the code is there. The ZSTD implementation is copied from > > kernel. The allocators need to be properly set up, as it needs to use > > grub_malloc/grub_free for the workspace thats called from some ZSTD_* > > functions. > > > > https://github.com/kdave/grub/tree/btrfs-zstd > > The branch is now in a state that can be tested. Turns out the memory > requirements are too much for grub, so the boot fails with "not enough > memory". The calculated value > > ZSTD_BTRFS_MAX_INPUT: 131072 > ZSTD_DStreamWorkspaceBound with ZSTD_BTRFS_MAX_INPUT: 549424 > > This is not something I could fix easily, we'd probalby need a tuned > version of ZSTD for grub constraints. Adding Nick to CC. Somehow I am happy that I still have a plain Ext4 for /boot. :) Thanks for looking into Grub support anyway. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Read before you deploy btrfs + zstd
David Sterba - 14.11.17, 19:49: > On Tue, Nov 14, 2017 at 08:34:37AM +0100, Martin Steigerwald wrote: > > Hello David. > > > > David Sterba - 13.11.17, 23:50: > > > while 4.14 is still fresh, let me address some concerns I've seen on > > > linux > > > forums already. > > > > > > The newly added ZSTD support is a feature that has broader impact than > > > just the runtime compression. The btrfs-progs understand filesystem with > > > ZSTD since 4.13. The remaining key part is the bootloader. > > > > > > Up to now, there are no bootloaders supporting ZSTD. This could lead to > > > an > > > unmountable filesystem if the critical files under /boot get > > > accidentally > > > or intentionally compressed by ZSTD. > > > > But otherwise ZSTD is safe to use? Are you aware of any other issues? > > No issues from my own testing or reported by other users. Thanks to you and the others. I think I try this soon. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Read before you deploy btrfs + zstd
Hello David. David Sterba - 13.11.17, 23:50: > while 4.14 is still fresh, let me address some concerns I've seen on linux > forums already. > > The newly added ZSTD support is a feature that has broader impact than > just the runtime compression. The btrfs-progs understand filesystem with > ZSTD since 4.13. The remaining key part is the bootloader. > > Up to now, there are no bootloaders supporting ZSTD. This could lead to an > unmountable filesystem if the critical files under /boot get accidentally > or intentionally compressed by ZSTD. But otherwise ZSTD is safe to use? Are you aware of any other issues? I consider switching from LZO to ZSTD on this ThinkPad T520 with Sandybridge. Thank you, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Data and metadata extent allocators [1/2]: Recap: The data story
Hello Hans, Hans van Kranenburg - 27.10.17, 20:17: > This is a followup to my previous threads named "About free space > fragmentation, metadata write amplification and (no)ssd" [0] and > "Experiences with metadata balance/convert" [1], exploring how good or > bad btrfs can handle filesystems that are larger than your average > desktop computer and/or which see a pattern of writing and deleting huge > amounts of files of wildly varying sizes all the time. […] > Q: How do I fight this and prevent getting into a situation where all > raw space is allocated, risking a filesystem crash? > A: Use btrfs balance to fight the symptoms. It reads data and writes it > out again without the free space fragments. What do you mean by a filesystem crash? Since kernel 4.5 or 4.6 I don´t see any BTRFS related filesystem hangs anymore on the /home BTRFS Dual SSD RAID 1 on my Laptop, which one or two copies of Akonadi, Baloo and other desktop related stuff write *heavily to* and which has all free space allocated into cunks since a pretty long time: merkaba:~> btrfs fi usage -T /home Overall: Device size: 340.00GiB Device allocated:340.00GiB Device unallocated:2.00MiB Device missing: 0.00B Used:290.32GiB Free (estimated): 23.09GiB (min: 23.09GiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data Metadata System Id Path RAID1 RAID1RAID1Unallocated -- -- - --- 1 /dev/mapper/msata-home 163.94GiB 6.03GiB 32.00MiB 1.00MiB 2 /dev/mapper/sata-home 163.94GiB 6.03GiB 32.00MiB 1.00MiB -- -- - --- Total 163.94GiB 6.03GiB 32.00MiB 2.00MiB Used 140.85GiB 4.31GiB 48.00KiB I didn´t do a balance on this filesystem since a long time (kernel 4.6). Granted my filesystem is smaller than the typical backup BTRFS. I do have two 3 TB and one 1,5 TB SATA disks I backup to and another 2 TB BTRFS on a backup server that I use for borgbackup (and that doesn´t yet do any snapshots and may be better of running as XFS as it doesn´t really need snapshots as borgbackup takes care of that. A BTRFS snapshot would only come handy to be able to go back to a previous borgbackup repo in case it for whatever reason gets corrupted or damaged / deleted by an attacker who only access to non privileged user). – However all of these filesystems have plenty of free space currently and are not accessed daily. > Q: Why would it crash the file system when all raw space is allocated? > Won't it start trying harder to reuse the free space inside? > A: Yes, it will, for data. The big problem here is that allocation of a > new metadata chunk when needed is not possible any more. And there it hangs or really crashes? […] > Q: Why do the pictures of my data block groups look like someone fired a > shotgun at it. [3], [4]? > A: Because the data extent allocator that is active when using the 'ssd' > mount option both tends to ignore smaller free space fragments all the > time, and also behaves in a way that causes more of them to appear. [5] > > Q: Wait, why is there "ssd" in my mount options? Why does btrfs think my > iSCSI attached lun is an SSD? > A: Because it makes wrong assumptions based on the rotational attribute, > which we can also see in sysfs. > > Q: Why does this ssd mode ignore free space? > A: Because it makes assumptions about the mapping of the addresses of > the block device we see in linux and the storage in actual flash chips > inside the ssd. Based on that information it decides where to write or > where not to write any more. > > Q: Does this make sense in 2017? > A: No. The interesting relevant optimization when writing to an ssd > would be to write all data together that will be deleted or overwritten > together at the same time in the future. Since btrfs does not come with > a time machine included, it can't do this. So, remove this behaviour > instead. [6] > > Q: What will happen when I use kernel 4.14 with the previously mentioned > change, or if I change to the nossd mount option explicitely already? > A: Relatively small free space fragments in existing chunks will > actually be reused for new writes that fit, working from the beginning > of the virtual address space upwards. It's like tetris, trying to > completely fill up the lowest lines first. See the big difference in > behavior when changing extent allocator happening at 16 seconds into > this timelapse movie: [7] (virtual address space) I see a difference in behavior but I do not yet fully understand what I am looking at. > Q: But what if all my chunks have badly fragmented free space right now? > A: If
Something like ZFS Channel Programs for BTRFS & probably XFS or even VFS?
[repost. I didn´t notice autocompletion gave me wrong address for fsdevel, blacklisted now] Hello. What do you think of http://open-zfs.org/wiki/Projects/ZFS_Channel_Programs ? There are quite some BTRFS maintenance programs like the deduplication stuff. Also regular scrubs… and in certain circumstances probably balances can make sense. In addition to this XFS got scrub functionality as well. Now putting the foundation for such a functionality in the kernel I think would only be reasonable if it cannot be done purely within user space, so I wonder about the safety from other concurrent ZFS modification and atomicity that are mentioned on the wiki page. The second set of slides, those the OpenZFS Developer Commit 2014, which are linked to on the wiki page explain this more. (I didn´t look the first ones, as I am no fan of slideshare.net and prefer a simple PDF to download and view locally anytime, not for privacy reasons alone, but also to avoid a using a crappy webpage over a wonderfully functional PDF viewer fat client like Okular) Also I wonder about putting a lua interpreter into the kernel, but it seems at least NetBSD developers added one to their kernel with version 7.0¹. I also ask this cause I wondered about a kind of fsmaintd or volmaintd for quite a while, and thought… it would be nice to do this in a generic way, as BTRFS is not the only filesystem which supports maintenance operations. However if it can all just nicely be done in userspace, I am all for it. [1] http://www.netbsd.org/releases/formal-7/NetBSD-7.0.html (tons of presentation PDFs on their site as well) Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Something like ZFS Channel Programs for BTRFS & probably XFS or even VFS?
Hello. What do you think of http://open-zfs.org/wiki/Projects/ZFS_Channel_Programs ? There are quite some BTRFS maintenance programs like the deduplication stuff. Also regular scrubs… and in certain circumstances probably balances can make sense. In addition to this XFS got scrub functionality as well. Now putting the foundation for such a functionality in the kernel I think would only be reasonable if it cannot be done purely within user space, so I wonder about the safety from other concurrent ZFS modification and atomicity that are mentioned on the wiki page. The second set of slides, those the OpenZFS Developer Commit 2014, which are linked to on the wiki page explain this more. (I didn´t look the first ones, as I am no fan of slideshare.net and prefer a simple PDF to download and view locally anytime, not for privacy reasons alone, but also to avoid a using a crappy webpage over a wonderfully functional PDF viewer fat client like Okular) Also I wonder about putting a lua interpreter into the kernel, but it seems at least NetBSD developers added one to their kernel with version 7.0¹. I also ask this cause I wondered about a kind of fsmaintd or volmaintd for quite a while, and thought… it would be nice to do this in a generic way, as BTRFS is not the only filesystem which supports maintenance operations. However if it can all just nicely be done in userspace, I am all for it. [1] http://www.netbsd.org/releases/formal-7/NetBSD-7.0.html (tons of presentation PDFs on their site as well) Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 4.11.6 / more corruption / root 15455 has a root item with a more recent gen (33682) compared to the found root node (0)
Hello Duncan. Duncan - 09.07.17, 11:17: > Paul Jones posted on Sun, 09 Jul 2017 09:16:36 + as excerpted: > >> Marc MERLIN - 08.07.17, 21:34: > >> > This is now the 3rd filesystem I have (on 3 different machines) that > >> > is getting corruption of some kind (on 4.11.6). > >> > >> Anyone else getting corruptions with 4.11? > >> > >> I happily switch back to 4.10.17 or even 4.9 if that is the case. I may > >> even do so just from your reports. Well, yes, I will do exactly that. I > >> just switch back for 4.10 for now. Better be safe, than sorry. > > > > No corruption for me - I've been on 4.11 since about .2 and everything > > seems fine. Currently on 4.11.8 > > No corruptions here either. 4.12.0 now, previously 4.12-rc5(ish, git), > before that 4.11.0. > > I have however just upgraded to new ssds then wiped and setup the old […] > Also, all my btrfs are raid1 or dup for checksummed redundancy, and > relatively small, the largest now 80 GiB per device, after the upgrade. > And my use-case doesn't involve snapshots or subvolumes. > > So any bug that is most likely on older filesystems, say those without > the no-holes feature, for instance, or that doesn't tend to hit raid1 or > dup mode, or that is less likely on small filesystems on fast ssds, or > that triggers most often with reflinks and thus on filesystems with > snapshots, is unlikely to hit me. Hmmm, the BTRFS filesystems on my laptop 3 to 5 or even more years old. I stick with 4.10 for now, I think. The older ones are RAID 1 across two SSDs, the newer one is single device, on one SSD. These filesystems didn´t fail me in years and since 4.5 or 4.6 even the "I search for free space" kernel hang (hung tasks and all that) is gone as well. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 4.11.6 / more corruption / root 15455 has a root item with a more recent gen (33682) compared to the found root node (0)
Hello Marc. Marc MERLIN - 08.07.17, 21:34: > Sigh, > > This is now the 3rd filesystem I have (on 3 different machines) that is > getting corruption of some kind (on 4.11.6). Anyone else getting corruptions with 4.11? I happily switch back to 4.10.17 or even 4.9 if that is the case. I may even do so just from your reports. Well, yes, I will do exactly that. I just switch back for 4.10 for now. Better be safe, than sorry. I know how you feel, Marc. I posted about a corruption on one of my backup harddisks here some time ago that btrfs check --repair wasn´t able to handle. I redid that disk from scratch and it took a long, long time. I agree with you that this has to stop. Before that I will never *ever* recommend this to a customer. Ideally no corruptions in stable kernels, especially when its a .6 at the end of the version number. But if so… then fixable. Other filesystems like Ext4 and XFS can do it… so this should be possible with BTRFS as well. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: runtime btrfsck
Stefan Priebe - Profihost AG - 10.05.17, 09:02: > I'm now trying btrfs progs 4.10.2. Is anybody out there who can tell me > something about the expected runtime or how to fix bad key ordering? I had a similar issue which remained unresolved. But I clearly saw that btrfs check was running in a loop, see thread: [4.9] btrfs check --repair looping over file extent discount errors So it would be interesting to see the exact output of btrfs check, maybe there is something like repeated numbers that also indicate a loop. I was about to say that BTRFS is production ready before this issue happened. I still think it for a lot of setup mostly is, as at least the "I get stuck on the CPU while searching for free space" issue seems to be gone since about anything between 4.5/4.6 kernels. I also think so regarding absence of data loss. I was able to copy over all of the data I needed of the broken filesystem. Yet, when it comes to btrfs check? Its still quite rudimentary if you ask me. So unless someone has a clever idea here and shares it with you, it may be needed to backup anything you can from this filesystem and then start over from scratch. As to my past experience something like xfs_repair surpasses btrfs check in the ability to actually fix broken filesystem by a great extent. Ciao, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [4.9] btrfs check --repair looping over file extent discount errors
Martin Steigerwald - 22.04.17, 20:01: > Chris Murphy - 22.04.17, 09:31: > > Is the file system created with no-holes? > > I have how to find out about it and while doing accidentally set that I didn´t find out how to find out about it and… > feature on another filesystem (btrfstune only seems to be able to enable > the feature, not show the current state of it). > > But as there is no notice of the feature being set as standard in manpage of > mkfs.btrfs as of BTRFS tools 4.9.1 and as I didn´t set it myself, I best > bet is that the feature is not enable on the filesystem. > > Now I wonder… how to disable the feature on that other filesystem again. -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [4.9] btrfs check --repair looping over file extent discount errors
Hello Chris. Chris Murphy - 22.04.17, 09:31: > Is the file system created with no-holes? I have how to find out about it and while doing accidentally set that feature on another filesystem (btrfstune only seems to be able to enable the feature, not show the current state of it). But as there is no notice of the feature being set as standard in manpage of mkfs.btrfs as of BTRFS tools 4.9.1 and as I didn´t set it myself, I best bet is that the feature is not enable on the filesystem. Now I wonder… how to disable the feature on that other filesystem again. Thanks, -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [4.9] btrfs check --repair looping over file extent discount errors
Hello. I am planning to copy of important data on the disk with the broken filesystem to the disk with the good filesystem and then reformatitting the disk with the broken filesystem soon, probably in the course of the day… so in case you want any debug information before that, let me know ASAP. Thanks, Martin Martin Steigerwald - 14.04.17, 21:35: > Hello, > > backup harddisk connected via eSATA. Hard kernel hang, mouse pointer > freezing two times seemingly after finishing /home backup and creating new > snapshot on source BTRFS SSD RAID 1 for / in order to backup it. I did > scrubbed / and it appears to be okay, but I didn´t run btrfs check on it. > Anyway deleting that subvolume works and I as I suspected an issue with the > backup disk I started with that one. > > I got > > merkaba:~> btrfs --version > btrfs-progs v4.9.1 > > merkaba:~> cat /proc/version > Linux version 4.9.20-tp520-btrfstrim+ (martin@merkaba) (gcc version 6.3.0 > 20170321 (Debian 6.3.0-11) ) #6 SMP PREEMPT Mon Apr 3 11:42:17 CEST 2017 > > merkaba:~> btrfs fi sh feenwald > Label: 'feenwald' uuid: […] > Total devices 1 FS bytes used 1.26TiB > devid1 size 2.73TiB used 1.27TiB path /dev/sdc1 > > on Debian unstable on ThinkPad T520 connected via eSATA port on Minidock. > > > I am now running btrfs check --repair on it after without --repair the > command reported file extent discount errors and it appears to loop on the > same file extent discount errors for ages. Any advice? > > I do have another backup harddisk with BTRFS that worked fine today, so I do > not need to recover that drive immediately. I may let it run for a little > more time, but then will abort the repair process as I really think its > looping just over and over and over the same issues again. At some time I > may just copy all the stuff that is on that harddisk, but not on the other > one over to the other one and mkfs.btrfs the filesystem again, but I´d > rather like to know whats happening here. > > Here is output: > > merkaba:~> btrfs check --repair /dev/sdc1 > enabling repair mode > Checking filesystem on /dev/sdc1 > [… UUID ommited …] > checking extents > Fixed 0 roots. > checking free space cache > cache and super generation don't match, space cache will be invalidated > checking fs roots > root 257 inode 4979842 errors 100, file extent discount > Found file extent holes: > start: 0, len: 78798848 > root 257 inode 4980212 errors 100, file extent discount > Found file extent holes: > start: 0, len: 143360 > root 257 inode 4980214 errors 100, file extent discount > Found file extent holes: > start: 0, len: 4227072 > root 257 inode 4979842 errors 100, file extent discount > Found file extent holes: > start: 0, len: 78798848 > root 257 inode 4980212 errors 100, file extent discount > Found file extent holes: > start: 0, len: 143360 > root 257 inode 4980214 errors 100, file extent discount > Found file extent holes: > start: 0, len: 4227072 > root 257 inode 4979842 errors 100, file extent discount > Found file extent holes: > start: 0, len: 78798848 > root 257 inode 4980212 errors 100, file extent discount > Found file extent holes: > start: 0, len: 143360 > root 257 inode 4980214 errors 100, file extent discount > Found file extent holes: > start: 0, len: 4227072 > root 257 inode 4979842 errors 100, file extent discount > Found file extent holes: > start: 0, len: 78798848 > root 257 inode 4980212 errors 100, file extent discount > Found file extent holes: > start: 0, len: 143360 > root 257 inode 4980214 errors 100, file extent discount > Found file extent holes: > start: 0, len: 4227072 > [… hours later …] > root 257 inode 4979842 errors 100, file extent discount > Found file extent holes: > start: 0, len: 78798848 > root 257 inode 4980212 errors 100, file extent discount > Found file extent holes: > start: 0, len: 143360 > root 257 inode 4980214 errors 100, file extent discount > Found file extent holes: > start: 0, len: 4227072 > root 257 inode 4979842 errors 100, file extent discount > Found file extent holes: > start: 0, len: 78798848 > root 257 inode 4980212 errors 100, file extent discount > Found file extent holes: > start: 0, len: 143360 > root 257 inode 4980214 errors 100, file extent discount > Found file extent holes: > start: 0, len: 4227072 > root 257 inode 4979842 errors 100, file extent discount > Found file extent holes: > start: 0, len: 78798848 > root 257 inode 4980212 errors 100, file extent discount > Found file extent holes:
[4.9] btrfs check --repair looping over file extent discount errors
Hello, backup harddisk connected via eSATA. Hard kernel hang, mouse pointer freezing two times seemingly after finishing /home backup and creating new snapshot on source BTRFS SSD RAID 1 for / in order to backup it. I did scrubbed / and it appears to be okay, but I didn´t run btrfs check on it. Anyway deleting that subvolume works and I as I suspected an issue with the backup disk I started with that one. I got merkaba:~> btrfs --version btrfs-progs v4.9.1 merkaba:~> cat /proc/version Linux version 4.9.20-tp520-btrfstrim+ (martin@merkaba) (gcc version 6.3.0 20170321 (Debian 6.3.0-11) ) #6 SMP PREEMPT Mon Apr 3 11:42:17 CEST 2017 merkaba:~> btrfs fi sh feenwald Label: 'feenwald' uuid: […] Total devices 1 FS bytes used 1.26TiB devid1 size 2.73TiB used 1.27TiB path /dev/sdc1 on Debian unstable on ThinkPad T520 connected via eSATA port on Minidock. I am now running btrfs check --repair on it after without --repair the command reported file extent discount errors and it appears to loop on the same file extent discount errors for ages. Any advice? I do have another backup harddisk with BTRFS that worked fine today, so I do not need to recover that drive immediately. I may let it run for a little more time, but then will abort the repair process as I really think its looping just over and over and over the same issues again. At some time I may just copy all the stuff that is on that harddisk, but not on the other one over to the other one and mkfs.btrfs the filesystem again, but I´d rather like to know whats happening here. Here is output: merkaba:~> btrfs check --repair /dev/sdc1 enabling repair mode Checking filesystem on /dev/sdc1 [… UUID ommited …] checking extents Fixed 0 roots. checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots root 257 inode 4979842 errors 100, file extent discount Found file extent holes: start: 0, len: 78798848 root 257 inode 4980212 errors 100, file extent discount Found file extent holes: start: 0, len: 143360 root 257 inode 4980214 errors 100, file extent discount Found file extent holes: start: 0, len: 4227072 root 257 inode 4979842 errors 100, file extent discount Found file extent holes: start: 0, len: 78798848 root 257 inode 4980212 errors 100, file extent discount Found file extent holes: start: 0, len: 143360 root 257 inode 4980214 errors 100, file extent discount Found file extent holes: start: 0, len: 4227072 root 257 inode 4979842 errors 100, file extent discount Found file extent holes: start: 0, len: 78798848 root 257 inode 4980212 errors 100, file extent discount Found file extent holes: start: 0, len: 143360 root 257 inode 4980214 errors 100, file extent discount Found file extent holes: start: 0, len: 4227072 root 257 inode 4979842 errors 100, file extent discount Found file extent holes: start: 0, len: 78798848 root 257 inode 4980212 errors 100, file extent discount Found file extent holes: start: 0, len: 143360 root 257 inode 4980214 errors 100, file extent discount Found file extent holes: start: 0, len: 4227072 [… hours later …] root 257 inode 4979842 errors 100, file extent discount Found file extent holes: start: 0, len: 78798848 root 257 inode 4980212 errors 100, file extent discount Found file extent holes: start: 0, len: 143360 root 257 inode 4980214 errors 100, file extent discount Found file extent holes: start: 0, len: 4227072 root 257 inode 4979842 errors 100, file extent discount Found file extent holes: start: 0, len: 78798848 root 257 inode 4980212 errors 100, file extent discount Found file extent holes: start: 0, len: 143360 root 257 inode 4980214 errors 100, file extent discount Found file extent holes: start: 0, len: 4227072 root 257 inode 4979842 errors 100, file extent discount Found file extent holes: start: 0, len: 78798848 root 257 inode 4980212 errors 100, file extent discount Found file extent holes: start: 0, len: 143360 root 257 inode 4980214 errors 100, file extent discount Found file extent holes: start: 0, len: 4227072 This basically seems to go on like this forever. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Convert from RAID 5 to 10
Am Mittwoch, 30. November 2016, 12:09:23 CET schrieb Chris Murphy: > On Wed, Nov 30, 2016 at 7:37 AM, Austin S. Hemmelgarn > >wrote: > > The stability info could be improved, but _absolutely none_ of the things > > mentioned as issues with raid1 are specific to raid1. And in general, in > > the context of a feature stability matrix, 'OK' generally means that there > > are no significant issues with that specific feature, and since none of > > the > > issues outlined are specific to raid1, it does meet that description of > > 'OK'. > > Maybe the gotchas page needs a one or two liner for each profile's > gotchas compared to what the profile leads the user into believing. > The overriding gotcha with all Btrfs multiple device support is the > lack of monitoring and notification other than kernel messages; and > the raid10 actually being more like raid0+1 I think it certainly a > gotcha, however 'man mkfs.btrfs' contains a grid that very clearly > states raid10 can only safely lose 1 device. Wow, that manpage is quite an resource. Developers, documentation people definitely improved the official BTRFS documentation. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Convert from RAID 5 to 10
Am Mittwoch, 30. November 2016, 16:49:59 CET schrieb Wilson Meier: > Am 30/11/16 um 15:37 schrieb Austin S. Hemmelgarn: > > On 2016-11-30 08:12, Wilson Meier wrote: > >> Am 30/11/16 um 11:41 schrieb Duncan: > >>> Wilson Meier posted on Wed, 30 Nov 2016 09:35:36 +0100 as excerpted: > >>>> Am 30/11/16 um 09:06 schrieb Martin Steigerwald: > >>>>> Am Mittwoch, 30. November 2016, 10:38:08 CET schrieb Roman Mamedov: […] > >> It is really disappointing to not have this information in the wiki > >> itself. This would have saved me, and i'm quite sure others too, a lot > >> of time. > >> Sorry for being a bit frustrated. > > I'm not angry or something like that :) . > I just would like to have the possibility to read such information about > the storage i put my personal data (> 3 TB) on its official wiki. Anyone can get an account on the wiki and add notes there, so feel free. You can even use footnotes or something like that. Maybe it would be good to add a paragraph there that features are related to one another, so while BTRFS RAID 1 for example might be quite okay, it depends on features that are still flaky. I for myself rely quite much on BTRFS RAID 1 with lzo compression and it seems to work okay for me. -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Convert from RAID 5 to 10
Am Mittwoch, 30. November 2016, 10:38:08 CET schrieb Roman Mamedov: > On Wed, 30 Nov 2016 00:16:48 +0100 > > Wilson Meierwrote: > > That said, btrfs shouldn't be used for other then raid1 as every other > > raid level has serious problems or at least doesn't work as the expected > > raid level (in terms of failure recovery). > > RAID1 shouldn't be used either: > > *) Read performance is not optimized: all metadata is always read from the > first device unless it has failed, data reads are supposedly balanced > between devices per PID of the process reading. Better implementations > dispatch reads per request to devices that are currently idle. > > *) Write performance is not optimized, during long full bandwidth sequential > writes it is common to see devices writing not in parallel, but with a long > periods of just one device writing, then another. (Admittedly have been > some time since I tested that). > > *) A degraded RAID1 won't mount by default. > > If this was the root filesystem, the machine won't boot. > > To mount it, you need to add the "degraded" mount option. > However you have exactly a single chance at that, you MUST restore the RAID > to non-degraded state while it's mounted during that session, since it > won't ever mount again in the r/w+degraded mode, and in r/o mode you can't > perform any operations on the filesystem, including adding/removing > devices. > > *) It does not properly handle a device disappearing during operation. > (There is a patchset to add that). > > *) It does not properly handle said device returning (under a > different /dev/sdX name, for bonus points). > > Most of these also apply to all other RAID levels. So the stability matrix would need to be updated not to recommend any kind of BTRFS RAID 1 at the moment? Actually I faced the BTRFS RAID 1 read only after first attempt of mounting it "degraded" just a short time ago. BTRFS still needs way more stability work it seems to me. -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
Am Donnerstag, 17. November 2016, 12:05:31 CET schrieb Chris Murphy: > I think the wiki should be updated to reflect that raid1 and raid10 > are mostly OK. I think it's grossly misleading to consider either as > green/OK when a single degraded read write mount creates single chunks > that will then prevent a subsequent degraded read write mount. And > also the lack of various notifications of device faultiness I think > make it less than OK also. It's not in the "do not use" category but > it should be in the middle ground status so users can make informed > decisions. I agree – as error reporting I think is indead misleading. Feel free to edit it. Ciao, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
Am Mittwoch, 16. November 2016, 07:57:08 CET schrieb Austin S. Hemmelgarn: > On 2016-11-16 06:04, Martin Steigerwald wrote: > > Am Mittwoch, 16. November 2016, 16:00:31 CET schrieb Roman Mamedov: > >> On Wed, 16 Nov 2016 11:55:32 +0100 > >> > >> Martin Steigerwald <martin.steigerw...@teamix.de> wrote: […] > > As there seems to be no force option to override the limitation and I > > do not feel like compiling my own btrfs-tools right now, I will use rsync > > instead. > > In a case like this, I'd trust rsync more than send/receive. The > following rsync switches might also be of interest: > -a: This turns on a bunch of things almost everyone wants when using > rsync, similar to the same switch for cp, just with even more added in. > -H: This recreates hardlinks on the receiving end. > -S: This recreates sparse files. > -A: This copies POSIX ACL's > -X: This copies extended attributes (most of them at least, there are a > few that can't be arbitrarily written to). > Pre-creating the subvolumes by hand combined with using all of those > will get you almost everything covered by send/receive except for > sharing of extents and ctime. I usually use rsync -aAHXSP already :). I was able to rsync any relevant data of the disk which is now being deleted by shred command. Thank you, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
Am Mittwoch, 16. November 2016, 11:55:32 CET schrieben Sie: > So mounting work although for some reason scrubbing is aborted (I had this > issue a long time ago on my laptop as well). After removing /var/lib/btrfs > scrub status file for the filesystem: > > merkaba:~> btrfs scrub start /mnt/zeit > scrub started on /mnt/zeit, fsid […] (pid=9054) > merkaba:~> btrfs scrub status /mnt/zeit > scrub status for […] > scrub started at Wed Nov 16 11:52:56 2016 and was aborted after > 00:00:00 > total bytes scrubbed: 0.00B with 0 errors > > Anyway, I will now just rsync off the files. > > Interestingly enough btrfs restore complained about looping over certain > files… lets see whether the rsync or btrfs send/receive proceeds through. I have an idea on why scrubbing may not work: The filesystem is mounted read only and on checksum errors on one disk scrub would try to repair it with the good copy from another disk. Yes, this is it: merkaba:~> btrfs scrub start -r /dev/satafp1/daten scrub started on /dev/satafp1/daten, fsid […] (pid=9375) merkaba:~> btrfs scrub status /dev/satafp1/daten scrub status for […] scrub started at Wed Nov 16 12:13:27 2016, running for 00:00:10 total bytes scrubbed: 45.53MiB with 0 errors It would be helpful to receive a proper error message on this one. Okay, seems today I learned quite something about BTRFS. Thanks, -- Martin Steigerwald | Trainer teamix GmbH Südwestpark 43 90449 Nürnberg Tel.: +49 911 30999 55 | Fax: +49 911 30999 99 mail: martin.steigerw...@teamix.de | web: http://www.teamix.de | blog: http://blog.teamix.de Amtsgericht Nürnberg, HRB 18320 | Geschäftsführer: Oliver Kügow, Richard Müller teamix Support Hotline: +49 911 30999-112 *** Bitte liken Sie uns auf Facebook: facebook.com/teamix *** -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
Am Mittwoch, 16. November 2016, 16:00:31 CET schrieb Roman Mamedov: > On Wed, 16 Nov 2016 11:55:32 +0100 > > Martin Steigerwald <martin.steigerw...@teamix.de> wrote: > > I do think that above kernel messages invite such a kind of interpretation > > tough. I took the "BTRFS: open_ctree failed" message as indicative to some > > structural issue with the filesystem. > > For the reason as to why the writable mount didn't work, check "btrfs fi df" > for the filesystem to see if you have any "single" profile chunks on it: > quite likely you did already mount it "degraded,rw" in the past *once*, > after which those "single" chunks get created, and consequently it won't > mount r/w anymore (without lifting the restriction on the number of missing > devices as proposed). That exactly explains it. I very likely did a degraded mount without ro on this disk already. Funnily enough this creates another complication: merkaba:/mnt/zeit#1> btrfs send somesubvolume | btrfs receive /mnt/ someotherbtrfs ERROR: subvolume /mnt/zeit/somesubvolume is not read-only Yet: merkaba:/mnt/zeit> btrfs property get somesubvolume ro=false merkaba:/mnt/zeit> btrfs property set somesubvolume ro true ERROR: failed to set flags for somesubvolume: Read-only file system To me it seems right logic would be to allow the send to proceed in case the whole filesystem is readonly. As there seems to be no force option to override the limitation and I do not feel like compiling my own btrfs-tools right now, I will use rsync instead. Thanks, -- Martin Steigerwald | Trainer teamix GmbH Südwestpark 43 90449 Nürnberg Tel.: +49 911 30999 55 | Fax: +49 911 30999 99 mail: martin.steigerw...@teamix.de | web: http://www.teamix.de | blog: http://blog.teamix.de Amtsgericht Nürnberg, HRB 18320 | Geschäftsführer: Oliver Kügow, Richard Müller teamix Support Hotline: +49 911 30999-112 *** Bitte liken Sie uns auf Facebook: facebook.com/teamix *** -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
Am Mittwoch, 16. November 2016, 15:43:36 CET schrieb Roman Mamedov: > On Wed, 16 Nov 2016 11:25:00 +0100 > > Martin Steigerwald <martin.steigerw...@teamix.de> wrote: > > merkaba:~> mount -o degraded,clear_cache /dev/satafp1/backup /mnt/zeit > > mount: Falscher Dateisystemtyp, ungültige Optionen, der > > Superblock von /dev/mapper/satafp1-backup ist beschädigt, fehlende > > Kodierungsseite oder ein anderer Fehler > > > > Manchmal liefert das Systemprotokoll wertvolle Informationen – > > versuchen Sie dmesg | tail oder ähnlich > > > > merkaba:~#32> dmesg | tail -6 > > [ 3080.120687] BTRFS info (device dm-13): allowing degraded mounts > > [ 3080.120699] BTRFS info (device dm-13): force clearing of disk cache > > [ 3080.120703] BTRFS info (device dm-13): disk space caching is > > enabled > > [ 3080.120706] BTRFS info (device dm-13): has skinny extents > > [ 3080.150957] BTRFS warning (device dm-13): missing devices (1) > > exceeds the limit (0), writeable mount is not allowed > > [ 3080.195941] BTRFS: open_ctree failed > > I have to wonder did you read the above message? What you need at this point > is simply "-o degraded,ro". But I don't see that tried anywhere down the > line. > > See also (or try): https://patchwork.kernel.org/patch/9419189/ Actually I read that one, but I read more into it than what it was saying: I read into it that BTRFS would automatically use a read only mount. merkaba:~> mount -o degraded,ro /dev/satafp1/daten /mnt/zeit actually really works. *Thank you*, Roman. I do think that above kernel messages invite such a kind of interpretation tough. I took the "BTRFS: open_ctree failed" message as indicative to some structural issue with the filesystem. So mounting work although for some reason scrubbing is aborted (I had this issue a long time ago on my laptop as well). After removing /var/lib/btrfs scrub status file for the filesystem: merkaba:~> btrfs scrub start /mnt/zeit scrub started on /mnt/zeit, fsid […] (pid=9054) merkaba:~> btrfs scrub status /mnt/zeit scrub status for […] scrub started at Wed Nov 16 11:52:56 2016 and was aborted after 00:00:00 total bytes scrubbed: 0.00B with 0 errors Anyway, I will now just rsync off the files. Interestingly enough btrfs restore complained about looping over certain files… lets see whether the rsync or btrfs send/receive proceeds through. Ciao, -- Martin Steigerwald | Trainer teamix GmbH Südwestpark 43 90449 Nürnberg Tel.: +49 911 30999 55 | Fax: +49 911 30999 99 mail: martin.steigerw...@teamix.de | web: http://www.teamix.de | blog: http://blog.teamix.de Amtsgericht Nürnberg, HRB 18320 | Geschäftsführer: Oliver Kügow, Richard Müller teamix Support Hotline: +49 911 30999-112 *** Bitte liken Sie uns auf Facebook: facebook.com/teamix *** -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
ice dm-13): has skinny extents [ 3080.150957] BTRFS warning (device dm-13): missing devices (1) exceeds the limit (0), writeable mount is not allowed [ 3080.195941] BTRFS: open_ctree failed merkaba:~> mount -o degraded,clear_cache,usebackuproot /dev/satafp1/backup /mnt/zeit mount: Falscher Dateisystemtyp, ungültige Optionen, der Superblock von /dev/mapper/satafp1-backup ist beschädigt, fehlende Kodierungsseite oder ein anderer Fehler Manchmal liefert das Systemprotokoll wertvolle Informationen – versuchen Sie dmesg | tail oder ähnlich merkaba:~> dmesg | tail -7 [ 3173.784713] BTRFS info (device dm-13): allowing degraded mounts [ 3173.784728] BTRFS info (device dm-13): force clearing of disk cache [ 3173.784737] BTRFS info (device dm-13): trying to use backup root at mount time [ 3173.784742] BTRFS info (device dm-13): disk space caching is enabled [ 3173.784746] BTRFS info (device dm-13): has skinny extents [ 3173.816983] BTRFS warning (device dm-13): missing devices (1) exceeds the limit (0), writeable mount is not allowed [ 3173.865199] BTRFS: open_ctree failed I aborted repairing after this assert: merkaba:~#130> btrfs check --repair /dev/satafp1/backup &| stdbuf -oL tee btrfs-check-repair-satafp1-backup.log enabling repair mode warning, device 2 is missing Checking filesystem on /dev/satafp1/backup UUID: 01cf0493-476f-42e8-8905-61ef205313db checking extents Unable to find block group for 0 extent-tree.c:289: find_search_start: Assertion `1` failed. btrfs[0x43e418] btrfs(btrfs_reserve_extent+0x5c9)[0x4425df] btrfs(btrfs_alloc_free_block+0x63)[0x44297c] btrfs(__btrfs_cow_block+0xfc)[0x436636] btrfs(btrfs_cow_block+0x8b)[0x436bd8] btrfs[0x43ad82] btrfs(btrfs_commit_transaction+0xb8)[0x43c5dc] btrfs[0x4268b4] btrfs(cmd_check+0x)[0x427d6d] btrfs(main+0x12f)[0x40a341] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fb2e6bec2b1] btrfs(_start+0x2a)[0x40a37a] merkaba:~#130> btrfs --version btrfs-progs v4.7.3 (Honestly I think asserts like this need to be gone from btrfs-tools for good) About this I only found this unanswered mailing list post: btrfs-convert: Unable to find block group for 0 Date: Fri, 24 Jun 2016 11:09:27 +0200 https://www.spinics.net/lists/linux-btrfs/msg56478.html Out of curiosity I tried: merkaba:~#1> btrfs rescue zero-log //dev/satafp1/daten warning, device 2 is missing Clearing log on //dev/satafp1/daten, previous log_root 0, level 0 Unable to find block group for 0 extent-tree.c:289: find_search_start: Assertion `1` failed. btrfs[0x43e418] btrfs(btrfs_reserve_extent+0x5c9)[0x4425df] btrfs(btrfs_alloc_free_block+0x63)[0x44297c] btrfs(__btrfs_cow_block+0xfc)[0x436636] btrfs(btrfs_cow_block+0x8b)[0x436bd8] btrfs[0x43ad82] btrfs(btrfs_commit_transaction+0xb8)[0x43c5dc] btrfs[0x42c0d4] btrfs(main+0x12f)[0x40a341] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fb2f16a82b1] btrfs(_start+0x2a)[0x40a37a] (I didn´t expect much as this is an issue that AFAIK does not happen easily anymore, but I also thought it could not do much harm) Superblocks themselves seem to be sane: merkaba:~#1> btrfs rescue super-recover //dev/satafp1/daten All supers are valid, no need to recover So "btrfs restore" it is: merkaba:[…]> btrfs restore -mxs /dev/satafp1/daten daten-restore This prints out a ton of: Trying another mirror Trying another mirror But it actually works. Somewhat, I now just got Trying another mirror We seem to be looping a lot on daten-restore/[…]/virtualbox-4.1.18-dfsg/out/lib/vboxsoap.a, do you want to keep going on ? (y/N/a): after about 35 GiB of data restored. I answered no to this one and now it is at about 53 GiB already. I just got another one of these, but also not concerning a file I actually need. Thanks, -- Martin Steigerwald | Trainer teamix GmbH Südwestpark 43 90449 Nürnberg Tel.: +49 911 30999 55 | Fax: +49 911 30999 99 mail: martin.steigerw...@teamix.de | web: http://www.teamix.de | blog: http://blog.teamix.de Amtsgericht Nürnberg, HRB 18320 | Geschäftsführer: Oliver Kügow, Richard Müller teamix Support Hotline: +49 911 30999-112 *** Bitte liken Sie uns auf Facebook: facebook.com/teamix *** -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: stability matrix
Am Donnerstag, 15. September 2016, 07:54:26 CEST schrieb Austin S. Hemmelgarn: > On 2016-09-15 05:49, Hans van Kranenburg wrote: > > On 09/15/2016 04:14 AM, Christoph Anton Mitterer wrote: […] > I specifically do not think we should worry about distro kernels though. > If someone is using a specific distro, that distro's documentation > should cover what they support and what works and what doesn't. Some > (like Arch and to a lesser extent Gentoo) use almost upstream kernels, > so there's very little point in tracking them. Some (like Ubuntu and > Debian) use almost upstream LTS kernels, so there's little point > tracking them either. Many others though (like CentOS, RHEL, and OEL) > Use forked kernels that have so many back-ported patches that it's > impossible to track up-date to up-date what the hell they've got. A > rather ridiculous expression regarding herding of cats comes to mind > with respect to the last group. Yep. I just read through RHEL releasenotes for a RHEL 7 workshop I will hold for a customer… and noted that newer RHEL 7 kernels for example have device mapper from Kernel 4.1 (while the kernel still says its a 3.10 one), XFS from kernel this.that, including new incompat CRC disk format and the need to also upgrade xfsprogs in lockstep, and this and that from kernel this.that and so on. Frankenstein as an association comes to my mind, but I bet RHEL kernel engineers know what they are doing. -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is stability a joke?
Am Donnerstag, 15. September 2016, 07:55:36 CEST schrieb Kai Krakow: > Am Mon, 12 Sep 2016 08:20:20 -0400 > > schrieb "Austin S. Hemmelgarn" <ahferro...@gmail.com>: > > On 2016-09-11 09:02, Hugo Mills wrote: > > > On Sun, Sep 11, 2016 at 02:39:14PM +0200, Waxhead wrote: > > >> Martin Steigerwald wrote: > > [...] > > [...] > > [...] > > [...] > > > > >> That is exactly the same reason I don't edit the wiki myself. I > > >> could of course get it started and hopefully someone will correct > > >> what I write, but I feel that if I start this off I don't have deep > > >> enough knowledge to do a proper start. Perhaps I will change my > > >> mind about this. > > >> > > >Given that nobody else has done it yet, what are the odds that > > > > > > someone else will step up to do it now? I would say that you should > > > at least try. Yes, you don't have as much knowledge as some others, > > > but if you keep working at it, you'll gain that knowledge. Yes, > > > you'll probably get it wrong to start with, but you probably won't > > > get it *very* wrong. You'll probably get it horribly wrong at some > > > point, but even the more knowledgable people you're deferring to > > > didn't identify the problems with parity RAID until Zygo and Austin > > > and Chris (and others) put in the work to pin down the exact > > > issues. > > > > FWIW, here's a list of what I personally consider stable (as in, I'm > > willing to bet against reduced uptime to use this stuff on production > > systems at work and personal systems at home): > > 1. Single device mode, including DUP data profiles on single device > > without mixed-bg. > > 2. Multi-device raid0, raid1, and raid10 profiles with symmetrical > > devices (all devices are the same size). > > 3. Multi-device single profiles with asymmetrical devices. > > 4. Small numbers (max double digit) of snapshots, taken at infrequent > > intervals (no more than once an hour). I use single snapshots > > regularly to get stable images of the filesystem for backups, and I > > keep hourly ones of my home directory for about 48 hours. > > 5. Subvolumes used to isolate parts of a filesystem from snapshots. > > I use this regularly to isolate areas of my filesystems from backups. > > 6. Non-incremental send/receive (no clone source, no parent's, no > > deduplication). I use this regularly for cloning virtual machines. > > 7. Checksumming and scrubs using any of the profiles I've listed > > above. 8. Defragmentation, including autodefrag. > > 9. All of the compat_features, including no-holes and skinny-metadata. > > > > Things I consider stable enough that I'm willing to use them on my > > personal systems but not systems at work: > > 1. In-line data compression with compress=lzo. I use this on my > > laptop and home server system. I've never had any issues with it > > myself, but I know that other people have, and it does seem to make > > other things more likely to have issues. > > 2. Batch deduplication. I only use this on the back-end filesystems > > for my personal storage cluster, and only because I have multiple > > copies as a result of GlusterFS on top of BTRFS. I've not had any > > significant issues with it, and I don't remember any reports of data > > loss resulting from it, but it's something that people should not be > > using if they don't understand all the implications. > > I could at least add one "don't do it": > > Don't use BFQ patches (it's an IO scheduler) if you're using btrfs. > Some people like to use it especially for running VMs and desktops > because it provides very good interactivity while maintaining very good > throughput. But it completely destroyed my btrfs beyond repair at least > twice, either while actually using a VM (in VirtualBox) or during high > IO loads. I now stick to the deadline scheduler instead which provides > very good interactivity for me, too, and the corruptions didn't occur > again so far. > > The story with BFQ has always been the same: System suddenly freezes > during moderate to high IO until all processes stop working (no process > shows D state, tho). Only hard reboot possible. After rebooting, access > to some (unrelated) files may fail with "errno=-17 Object already > exists" which cannot be repaired. If it affects files needed during > boot, you are screwed because file system goes RO. This could be a further row in the table. And well… as for CFQ Jens Axboe currently works on bandwidth throttling patches *exactly* for the reason to provide more interactivity and fairness to I/O operations in between. Right now, Completely Fair in CFQ is a *huge* exaggeration, at least while you have a dd bs=1M thing running. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is stability a joke?
Hello Nicholas. Am Mittwoch, 14. September 2016, 21:05:52 CEST schrieb Nicholas D Steeves: > On Mon, Sep 12, 2016 at 08:20:20AM -0400, Austin S. Hemmelgarn wrote: > > On 2016-09-11 09:02, Hugo Mills wrote: […] > > As far as documentation though, we [BTRFS] really do need to get our act > > together. It really doesn't look good to have most of the best > > documentation be in the distro's wikis instead of ours. I'm not trying to > > say the distros shouldn't be documenting BTRFS, but the point at which > > Debian (for example) has better documentation of the upstream version of > > BTRFS than the upstream project itself does, that starts to look bad. > > I would have loved to have this feature-to-stability list when I > started working on the Debian documentation! I started it because I > was saddened by number of horror story "adventures with btrfs" > articles and posts I had read about, combined with the perspective of > certain members within the Debian community that it was a toy fs. > > Are my contributions to that wiki of a high enough quality that I > can work on the upstream one? Do you think the broader btrfs > community is interested in citations and curated links to discussions? > > eg: if a company wants to use btrfs, they check the status page, see a > feature they want is still in the yellow zone of stabilisation, and > then follow the links to familiarise themselves with past discussions. > I imagine this would also help individuals or grad students more > quickly familiarise themselves with the available literature before > choosing a specific project. If regular updates from SUSE, STRATO, > Facebook, and Fujitsu are also publicly available the k.org wiki would > be a wonderful place to syndicate them! I definately think the quality of your contributions is high enough, others can also proofread and give in their experiences, so… By *all* means, go ahead *already*. It doesn´t fit all inside the table directly, I bet, *but* you can use footnotes or further explainations regarding features that need them with a headline per feature below the table and a link to it from within the table. Thank you! -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is stability a joke? (wiki updated)
Am Dienstag, 13. September 2016, 07:28:38 CEST schrieb Austin S. Hemmelgarn: > On 2016-09-12 16:44, Chris Murphy wrote: > > On Mon, Sep 12, 2016 at 2:35 PM, Martin Steigerwald <mar...@lichtvoll.de> wrote: > >> Am Montag, 12. September 2016, 23:21:09 CEST schrieb Pasi Kärkkäinen: > >>> On Mon, Sep 12, 2016 at 09:57:17PM +0200, Martin Steigerwald wrote: > >>>> Am Montag, 12. September 2016, 18:27:47 CEST schrieb David Sterba: > >>>>> On Mon, Sep 12, 2016 at 04:27:14PM +0200, David Sterba wrote: […] > >>>>> https://btrfs.wiki.kernel.org/index.php/Status > >>>> > >>>> Great. > >>>> > >>>> I made to minor adaption. I added a link to the Status page to my > >>>> warning > >>>> in before the kernel log by feature page. And I also mentioned that at > >>>> the time the page was last updated the latest kernel version was 4.7. > >>>> Yes, thats some extra work to update the kernel version, but I think > >>>> its > >>>> beneficial to explicitely mention the kernel version the page talks > >>>> about. Everyone who updates the page can update the version within a > >>>> second. > >>> > >>> Hmm.. that will still leave people wondering "but I'm running Linux 4.4, > >>> not 4.7, I wonder what the status of feature X is.." > >>> > >>> Should we also add a column for kernel version, so we can add "feature X > >>> is > >>> known to be OK on Linux 3.18 and later".. ? Or add those to "notes" > >>> field, > >>> where applicable? > >> > >> That was my initial idea, and it may be better than a generic kernel > >> version for all features. Even if we fill in 4.7 for any of the features > >> that are known to work okay for the table. > >> > >> For RAID 1 I am willing to say it works stable since kernel 3.14, as this > >> was the kernel I used when I switched /home and / to Dual SSD RAID 1 on > >> this ThinkPad T520. > > > > Just to cut yourself some slack, you could skip 3.14 because it's EOL > > now, and just go from 4.4. > > That reminds me, we should probably make a point to make it clear that > this is for the _upstream_ mainline kernel versions, not for versions > from some arbitrary distro, and that people should check the distro's > documentation for that info. I´d do the following: Really state the first known to work stable kernel version for a feature. But before the table state this: 1) Instead of the first known to work stable kernel for a feature recommend to use the latest upstream kernel or alternatively the latest upstream LTS kernel for those users who want to play it a bit safer. 2) For stable distros such as SLES, RHEL, Ubuntu LTS, Debian Stable recommend to check distro documentation. Note that some distro kernels track upstream kernels quite closely like Debian backport kernel or Ubuntu kernel backports PPA. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is stability a joke? (wiki updated)
Am Montag, 12. September 2016, 23:21:09 CEST schrieb Pasi Kärkkäinen: > On Mon, Sep 12, 2016 at 09:57:17PM +0200, Martin Steigerwald wrote: > > Am Montag, 12. September 2016, 18:27:47 CEST schrieb David Sterba: > > > On Mon, Sep 12, 2016 at 04:27:14PM +0200, David Sterba wrote: > > > > > I therefore would like to propose that some sort of feature / > > > > > stability > > > > > matrix for the latest kernel is added to the wiki preferably > > > > > somewhere > > > > > where it is easy to find. It would be nice to archive old matrix'es > > > > > as > > > > > well in case someone runs on a bit older kernel (we who use Debian > > > > > tend > > > > > to like older kernels). In my opinion it would make things bit > > > > > easier > > > > > and perhaps a bit less scary too. Remember if you get bitten badly > > > > > once > > > > > you tend to stay away from from it all just in case, if you on the > > > > > other > > > > > hand know what bites you can safely pet the fluffy end instead :) > > > > > > > > Somebody has put that table on the wiki, so it's a good starting > > > > point. > > > > I'm not sure we can fit everything into one table, some combinations > > > > do > > > > not bring new information and we'd need n-dimensional matrix to get > > > > the > > > > whole picture. > > > > > > https://btrfs.wiki.kernel.org/index.php/Status > > > > Great. > > > > I made to minor adaption. I added a link to the Status page to my warning > > in before the kernel log by feature page. And I also mentioned that at > > the time the page was last updated the latest kernel version was 4.7. > > Yes, thats some extra work to update the kernel version, but I think its > > beneficial to explicitely mention the kernel version the page talks > > about. Everyone who updates the page can update the version within a > > second. > > Hmm.. that will still leave people wondering "but I'm running Linux 4.4, not > 4.7, I wonder what the status of feature X is.." > > Should we also add a column for kernel version, so we can add "feature X is > known to be OK on Linux 3.18 and later".. ? Or add those to "notes" field, > where applicable? That was my initial idea, and it may be better than a generic kernel version for all features. Even if we fill in 4.7 for any of the features that are known to work okay for the table. For RAID 1 I am willing to say it works stable since kernel 3.14, as this was the kernel I used when I switched /home and / to Dual SSD RAID 1 on this ThinkPad T520. -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is stability a joke? (wiki updated)
Am Montag, 12. September 2016, 18:27:47 CEST schrieb David Sterba: > On Mon, Sep 12, 2016 at 04:27:14PM +0200, David Sterba wrote: > > > I therefore would like to propose that some sort of feature / stability > > > matrix for the latest kernel is added to the wiki preferably somewhere > > > where it is easy to find. It would be nice to archive old matrix'es as > > > well in case someone runs on a bit older kernel (we who use Debian tend > > > to like older kernels). In my opinion it would make things bit easier > > > and perhaps a bit less scary too. Remember if you get bitten badly once > > > you tend to stay away from from it all just in case, if you on the other > > > hand know what bites you can safely pet the fluffy end instead :) > > > > Somebody has put that table on the wiki, so it's a good starting point. > > I'm not sure we can fit everything into one table, some combinations do > > not bring new information and we'd need n-dimensional matrix to get the > > whole picture. > > https://btrfs.wiki.kernel.org/index.php/Status Great. I made to minor adaption. I added a link to the Status page to my warning in before the kernel log by feature page. And I also mentioned that at the time the page was last updated the latest kernel version was 4.7. Yes, thats some extra work to update the kernel version, but I think its beneficial to explicitely mention the kernel version the page talks about. Everyone who updates the page can update the version within a second. -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Small fs
Am Sonntag, 11. September 2016, 19:46:32 CEST schrieb Hugo Mills: > On Sun, Sep 11, 2016 at 09:13:28PM +0200, Martin Steigerwald wrote: > > Am Sonntag, 11. September 2016, 16:44:23 CEST schrieb Duncan: > > > * Metadata, and thus mixed-bg, defaults to DUP mode on a single-device > > > filesystem (except on ssd where I actually still use it myself, and > > > recommend it except for ssds that do firmware dedupe). In mixed-mode > > > this means two copies of data as well, which halves the usable space. > > > > > > IOW, when using mixed-mode, which is recommended under a gig, and dup > > > replication which is then the single-device default, effective usable > > > space is **HALVED**, so 256 MiB btrfs size becomes 128 MiB usable. (!!) > > > > I don´t get this part. That is just *metadata* being duplicated, not the > > actual *data* inside the files. Or am I missing something here? > >In mixed mode, there's no distinction: Data and metadata both use > the same chunks. If those chunks are DUP, then both data and metadata > are duplicated, and you get half the space available. In german I´d say "autsch", in english according to pda.leo.org "ouch", to this. Okay, I just erased using mixed mode as an idea from my mind altogether :). Just like I think I will never use a BTRFS below 5 GiB. Well, with one exception, maybe on the eMMC flash of the new Omnia Turris router that I hope will arrive soon at my place. -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
compress=lzo safe to use? (was: Re: Trying to rescue my data :()
Am Sonntag, 26. Juni 2016, 13:13:04 CEST schrieb Steven Haigh: > On 26/06/16 12:30, Duncan wrote: > > Steven Haigh posted on Sun, 26 Jun 2016 02:39:23 +1000 as excerpted: > >> In every case, it was a flurry of csum error messages, then instant > >> death. > > > > This is very possibly a known bug in btrfs, that occurs even in raid1 > > where a later scrub repairs all csum errors. While in theory btrfs raid1 > > should simply pull from the mirrored copy if its first try fails checksum > > (assuming the second one passes, of course), and it seems to do this just > > fine if there's only an occasional csum error, if it gets too many at > > once, it *does* unfortunately crash, despite the second copy being > > available and being just fine as later demonstrated by the scrub fixing > > the bad copy from the good one. > > > > I'm used to dealing with that here any time I have a bad shutdown (and > > I'm running live-git kde, which currently has a bug that triggers a > > system crash if I let it idle and shut off the monitors, so I've been > > getting crash shutdowns and having to deal with this unfortunately often, > > recently). Fortunately I keep my root, with all system executables, etc, > > mounted read-only by default, so it's not affected and I can /almost/ > > boot normally after such a crash. The problem is /var/log and /home > > (which has some parts of /var that need to be writable symlinked into / > > home/var, so / can stay read-only). Something in the normal after-crash > > boot triggers enough csum errors there that I often crash again. > > > > So I have to boot to emergency mode and manually mount the filesystems in > > question, so nothing's trying to access them until I run the scrub and > > fix the csum errors. Scrub itself doesn't trigger the crash, thankfully, > > and once it has repaired all the csum errors due to partial writes on one > > mirror that either were never made or were properly completed on the > > other mirror, I can exit emergency mode and complete the normal boot (to > > the multi-user default target). As there's no more csum errors then > > because scrub fixed them all, the boot doesn't crash due to too many such > > errors, and I'm back in business. > > > > > > Tho I believe at least the csum bug that affects me may only trigger if > > compression is (or perhaps has been in the past) enabled. Since I run > > compress=lzo everywhere, that would certainly affect me. It would also > > explain why the bug has remained around for quite some time as well, > > since presumably the devs don't run with compression on enough for this > > to have become a personal itch they needed to scratch, thus its remaining > > untraced and unfixed. > > > > So if you weren't using the compress option, your bug is probably > > different, but either way, the whole thing about too many csum errors at > > once triggering a system crash sure does sound familiar, here. > > Yes, I was running the compress=lzo option as well... Maybe here lays a > common problem? Hmm… I found this from being referred to by reading Debian wiki page on BTRFS¹. I use compress=lzo on BTRFS RAID 1 since April 2014 and I never found an issue. Steven, your filesystem wasn´t RAID 1 but RAID 5 or 6? I just want to assess whether using compress=lzo might be dangerous to use in my setup. Actually right now I like to keep using it, since I think at least one of the SSDs does not compress. And… well… /home and / where I use it are both quite full already. [1] https://wiki.debian.org/Btrfs#WARNINGS Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Small fs
Am Sonntag, 11. September 2016, 21:56:07 CEST schrieb Imran Geriskovan: > On 9/11/16, Duncan <1i5t5.dun...@cox.net> wrote: > > Martin Steigerwald posted on Sun, 11 Sep 2016 17:32:44 +0200 as excerpted: > >>> What is the smallest recommended fs size for btrfs? > >>> Can we say size should be in multiples of 64MB? > >> > >> Do you want to know the smalled *recommended* or the smallest *possible* > >> size? > > In fact both. > I'm reconsidering my options for /boot Well my stance on boot still is: Ext4. Done. :) It just does not bother me. It practically makes no difference at all. It has no visible effect on my user experience and I never saw the need to snapshot / boot. But another approach in case you want to use BTRFS for /boot is to use a subvolume. Thats IMHO the SLES 12 default setup. They basically create subvolumes for /boot, /var, /var/lib/mysql – you name it. Big advantage: You have one big FS and do not need to plan space for partitions or LVs. Disadvantage: If it breaks, it breaks. That said, I think at a new installation I may do this for /boot. Just put it inside a subvolume. >From my experiences at work with customer systems and even some systems I setup myself, I often do not use little partitions anymore. I did so for a CentOS 7 training VM, just 2 GiB XFS for /var. Guess what happened? Last update was too long ago, so… yum tried to download a ton of packages and then complained it has not enough space in /var. Luckily I used LVM, so I enlarged partition LVM resides on, enlarged PV and then enlarged /var. There may be valid reasons to split things up, and I am quite comfortable with splitting / boot out, cause its, well, plannable easily enough. And it may make sense to split /var or /var/log out. But on BTRFS I would likely use subvolumes. Only thing I may separate would be /home to make it easier on a re-installation of the OS to keep it around. That said, I never ever reinstalled the Debian on this ThinkPad T520 since I initially installed it. And on previous laptops I even copied the Debian on the older laptop onto the newer laptop. With the T520 I reinstalled, cause I wanted to switch to 64 bit cleanly. -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Small fs
Am Sonntag, 11. September 2016, 16:44:23 CEST schrieb Duncan: > * Metadata, and thus mixed-bg, defaults to DUP mode on a single-device > filesystem (except on ssd where I actually still use it myself, and > recommend it except for ssds that do firmware dedupe). In mixed-mode > this means two copies of data as well, which halves the usable space. > > IOW, when using mixed-mode, which is recommended under a gig, and dup > replication which is then the single-device default, effective usable > space is **HALVED**, so 256 MiB btrfs size becomes 128 MiB usable. (!!) I don´t get this part. That is just *metadata* being duplicated, not the actual *data* inside the files. Or am I missing something here? -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Small fs
Am Sonntag, 11. September 2016, 18:27:30 CEST schrieben Sie: > What is the smallest recommended fs size for btrfs? > > - There are mentions of 256MB around the net. > - Gparted reserves minimum of 256MB for btrfs. > > With an ordinary partition on a single disk, > fs created with just "mkfs.btrfs /dev/sdxx": > - 128MB works fine. > - 127MB works but as if it is 64MB. > > Can we say size should be in multiples of 64MB? Do you want to know the smalled *recommended* or the smallest *possible* size? I personally wouldn´t go below one or two GiB or or so with BTRFS. On small filesystems, I don´t know the treshold right now it uses a mixed metadata/data format. And I think using smaller BTRFS filesystem invited any left over "filesystem is full while it isn´t" issues. Well there we go. Excerpt from mkfs.btrfs(8) manpage: -M|--mixed Normally the data and metadata block groups are isolated. The mixed mode will remove the isolation and store both types in the same block group type. This helps to utilize the free space regardless of the purpose and is suitable for small devices. The separate allocation of block groups leads to a situation where the space is reserved for the other block group type, is not available for allocation and can lead to ENOSPC state. The recommended size for the mixed mode is for filesystems less than 1GiB. The soft recommendation is to use it for filesystems smaller than 5GiB. The mixed mode may lead to degraded performance on larger filesystems, but is otherwise usable, even on multiple devices. The nodesize and sectorsize must be equal, and the block group types must match. Note versions up to 4.2.x forced the mixed mode for devices smaller than 1GiB. This has been removed in 4.3+ as it caused some usability issues. Thanks -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is stability a joke?
Am Sonntag, 11. September 2016, 16:54:25 CEST schrieben Sie: > Am Sonntag, 11. September 2016, 14:39:14 CEST schrieb Waxhead: > > Martin Steigerwald wrote: > > > Am Sonntag, 11. September 2016, 13:43:59 CEST schrieb Martin Steigerwald: > > >>>>> The Nouveau graphics driver have a nice feature matrix on it's > > >>>>> webpage > > >>>>> and I think that BTRFS perhaps should consider doing something like > > >>>>> that > > >>>>> on it's official wiki as well > > >>>> > > >>>> BTRFS also has a feature matrix. The links to it are in the "News" > > >>>> section > > >>>> however: > > >>>> > > >>>> https://btrfs.wiki.kernel.org/index.php/Changelog#By_feature > > […] > > > > I mentioned this matrix as a good *starting* point. And I think it would > > > be > > > easy to extent it: > > > > > > Just add another column called "Production ready". Then research / ask > > > about production stability of each feature. The only challenge is: Who > > > is > > > authoritative on that? I´d certainly ask the developer of a feature, but > > > I´d also consider user reports to some extent. > > > > > > Maybe thats the real challenge. > > > > > > If you wish, I´d go through each feature there and give my own > > > estimation. > > > But I think there are others who are deeper into this. > > > > That is exactly the same reason I don't edit the wiki myself. I could of > > course get it started and hopefully someone will correct what I write, > > but I feel that if I start this off I don't have deep enough knowledge > > to do a proper start. Perhaps I will change my mind about this. > > Well one thing would be to start with the column and start filling the more > easy stuff. And if its not known since what kernel version, but its known to > be stable I suggest to conservatively just put the first kernel version > into it where people think it is stable or in doubt even put 4.7 into it. > It can still be reduced to lower kernel versions. > > Well: I made a tiny start. I linked "Features by kernel version" more > prominently on the main page, so it is easier to find and also added the > following warning just above the table: > > "WARNING: The "Version" row states at which version a feature has been > merged into the mainline kernel. It does not tell anything about at which > kernel version it is considered mature enough for production use." > > Now I wonder: Would adding a "Production ready" column, stating the first > known to be stable kernel version make sense in this table? What do you > think? I can add the column and give some first rough, conservative > estimations on a few features. > > What do you think? Is this a good place? It isn´t as straight forward to add this column as I thought. If I add it after "Version" then the following fields are not aligned anymore, even tough they use some kind of identifier – but that identifier also doesn´t match the row title. After reading about mediawiki syntax I came to the conclusion that I need to add the new column in every data row as well and cannot just assign values to the rows and leave out whats not known yet. ! Feature !! Version !! Description !! Notes {{FeatureMerged |name=scrub |version=3.0 |text=Read all data and verify checksums, repair if possible. }} Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is stability a joke?
Am Sonntag, 11. September 2016, 13:02:21 CEST schrieb Hugo Mills: > On Sun, Sep 11, 2016 at 02:39:14PM +0200, Waxhead wrote: > > Martin Steigerwald wrote: > > >Am Sonntag, 11. September 2016, 13:43:59 CEST schrieb Martin Steigerwald: > > >>>>Thing is: This just seems to be when has a feature been implemented > > >>>>matrix. > > >>>>Not when it is considered to be stable. I think this could be done > > >>>>with > > >>>>colors or so. Like red for not supported, yellow for implemented and > > >>>>green for production ready. > > >>> > > >>>Exactly, just like the Nouveau matrix. It clearly shows what you can > > >>>expect from it. > > > > > >I mentioned this matrix as a good *starting* point. And I think it would > > >be > > >easy to extent it: > > > > > >Just add another column called "Production ready". Then research / ask > > >about production stability of each feature. The only challenge is: Who > > >is authoritative on that? I´d certainly ask the developer of a feature, > > >but I´d also consider user reports to some extent. > > > > > >Maybe thats the real challenge. > > > > > >If you wish, I´d go through each feature there and give my own > > >estimation. But I think there are others who are deeper into this. > > > > That is exactly the same reason I don't edit the wiki myself. I > > could of course get it started and hopefully someone will correct > > what I write, but I feel that if I start this off I don't have deep > > enough knowledge to do a proper start. Perhaps I will change my mind > > about this. > >Given that nobody else has done it yet, what are the odds that > someone else will step up to do it now? I would say that you should at > least try. Yes, you don't have as much knowledge as some others, but > if you keep working at it, you'll gain that knowledge. Yes, you'll > probably get it wrong to start with, but you probably won't get it > *very* wrong. You'll probably get it horribly wrong at some point, but > even the more knowledgable people you're deferring to didn't identify > the problems with parity RAID until Zygo and Austin and Chris (and > others) put in the work to pin down the exact issues. > >So I'd strongly encourage you to set up and maintain the stability > matrix yourself -- you have the motivation at least, and the knowledge > will come with time and effort. Just keep reading the mailing list and > IRC and bugzilla, and try to identify where you see lots of repeated > problems, and where bugfixes in those areas happen. > >So, go for it. You have a lot to offer the community. Yep! Fully agreed. -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is stability a joke?
Am Sonntag, 11. September 2016, 14:39:14 CEST schrieb Waxhead: > Martin Steigerwald wrote: > > Am Sonntag, 11. September 2016, 13:43:59 CEST schrieb Martin Steigerwald: > >>>>> The Nouveau graphics driver have a nice feature matrix on it's webpage > >>>>> and I think that BTRFS perhaps should consider doing something like > >>>>> that > >>>>> on it's official wiki as well > >>>> > >>>> BTRFS also has a feature matrix. The links to it are in the "News" > >>>> section > >>>> however: > >>>> > >>>> https://btrfs.wiki.kernel.org/index.php/Changelog#By_feature […] > > I mentioned this matrix as a good *starting* point. And I think it would > > be > > easy to extent it: > > > > Just add another column called "Production ready". Then research / ask > > about production stability of each feature. The only challenge is: Who is > > authoritative on that? I´d certainly ask the developer of a feature, but > > I´d also consider user reports to some extent. > > > > Maybe thats the real challenge. > > > > If you wish, I´d go through each feature there and give my own estimation. > > But I think there are others who are deeper into this. > > That is exactly the same reason I don't edit the wiki myself. I could of > course get it started and hopefully someone will correct what I write, > but I feel that if I start this off I don't have deep enough knowledge > to do a proper start. Perhaps I will change my mind about this. Well one thing would be to start with the column and start filling the more easy stuff. And if its not known since what kernel version, but its known to be stable I suggest to conservatively just put the first kernel version into it where people think it is stable or in doubt even put 4.7 into it. It can still be reduced to lower kernel versions. Well: I made a tiny start. I linked "Features by kernel version" more prominently on the main page, so it is easier to find and also added the following warning just above the table: "WARNING: The "Version" row states at which version a feature has been merged into the mainline kernel. It does not tell anything about at which kernel version it is considered mature enough for production use." Now I wonder: Would adding a "Production ready" column, stating the first known to be stable kernel version make sense in this table? What do you think? I can add the column and give some first rough, conservative estimations on a few features. What do you think? Is this a good place? Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is stability a joke?
Am Sonntag, 11. September 2016, 14:30:51 CEST schrieb Waxhead: > > I think what would be a good next step would be to ask developers / users > > about feature stability and then update the wiki. If thats important to > > you, I suggest you invest some energy in doing that. And ask for help. > > This mailinglist is a good idea. > > > > I already gave you my idea on what works for me. > > > > There is just one thing I won´t go further even a single step: The > > complaining path. As it leads to no desirable outcome. > > > > Thanks, > > My intention was not to be hostile and if my response sound a bit harsh > for you then by all means I do apologize for that. Okay, maybe I read something into your mail that you didn´t intend to put there. Sorry. Let us focus on the constructive way to move forward with this. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is stability a joke?
Am Sonntag, 11. September 2016, 13:43:59 CEST schrieb Martin Steigerwald: > > >> The Nouveau graphics driver have a nice feature matrix on it's webpage > > >> and I think that BTRFS perhaps should consider doing something like > > >> that > > >> on it's official wiki as well > > > > > > BTRFS also has a feature matrix. The links to it are in the "News" > > > section > > > however: > > > > > > https://btrfs.wiki.kernel.org/index.php/Changelog#By_feature > > > > I disagree, this is not a feature / stability matrix. It is a clearly a > > changelog by kernel version. > > It is a *feature* matrix. I fully said its not about stability, but about > implementation – I just wrote this a sentence after this one. There is no > need whatsoever to further discuss this as I never claimed that it is a > feature / stability matrix in the first place. > > > > Thing is: This just seems to be when has a feature been implemented > > > matrix. > > > Not when it is considered to be stable. I think this could be done with > > > colors or so. Like red for not supported, yellow for implemented and > > > green for production ready. > > > > Exactly, just like the Nouveau matrix. It clearly shows what you can > > expect from it. I mentioned this matrix as a good *starting* point. And I think it would be easy to extent it: Just add another column called "Production ready". Then research / ask about production stability of each feature. The only challenge is: Who is authoritative on that? I´d certainly ask the developer of a feature, but I´d also consider user reports to some extent. Maybe thats the real challenge. If you wish, I´d go through each feature there and give my own estimation. But I think there are others who are deeper into this. I do think for example that scrubbing and auto raid repair are stable, except for RAID 5/6. Also device statistics and RAID 0 and 1 I consider to be stable. I think RAID 10 is also stable, but as I do not run it, I don´t know. For me also skinny-metadata is stable. For me so far even compress=lzo seems to be stable, but well for others it may not. Since what kernel version? Now, there you go. I have no idea. All I know I started BTRFS with Kernel 2.6.38 or 2.6.39 on my laptop, but not as RAID 1 at that time. See, the implementation time of a feature is much easier to assess. Maybe thats part of the reason why there is not stability matrix: Maybe no one *exactly* knows *for sure*. How could you? So I would even put a footnote on that "production ready" row explaining "Considered to be stable by developer and user oppinions". Of course additionally it would be good to read about experiences of corporate usage of BTRFS. I know at least Fujitsu, SUSE, Facebook, Oracle are using it. But I don´t know in what configurations and with what experiences. One Oracle developer invests a lot of time to bring BTRFS like features to XFS and RedHat still favors XFS over BTRFS, even SLES defaults to XFS for /home and other non /-filesystems. That also tells a story. Some ideas you can get from SUSE releasenotes. Even if you do not want to use it, it tells something and I bet is one of the better sources of information regarding your question you can get at this time. Cause I believe SUSE developers invested some time to assess the stability of features. Cause they would carefully assess what they can support in enterprise environments. There is also someone from Fujitsu who shared experiences in a talk, I can search the URL to the slides again. I bet Chris Mason and other BTRFS developers at Facebook have some idea on what they use within Facebook as well. To what extent they are allowed to talk about it… I don´t know. My personal impression is that as soon as Chris went to Facebook he became quite quiet. Maybe just due to being busy. Maybe due to Facebook being concerned much more about the privacy of itself than of its users. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is stability a joke?
Am Sonntag, 11. September 2016, 13:21:30 CEST schrieb Zoiled: > Martin Steigerwald wrote: > > Am Sonntag, 11. September 2016, 10:55:21 CEST schrieb Waxhead: > >> I have been following BTRFS for years and have recently been starting to > >> use BTRFS more and more and as always BTRFS' stability is a hot topic. > >> Some says that BTRFS is a dead end research project while others claim > >> the opposite. > > > > First off: On my systems BTRFS definately runs too stable for a research > > project. Actually: I have zero issues with stability of BTRFS on *any* of > > my systems at the moment and in the last half year. > > > > The only issue I had till about half an year ago was BTRFS getting stuck > > at > > seeking free space on a highly fragmented RAID 1 + compress=lzo /home. > > This > > went away with either kernel 4.4 or 4.5. > > > > Additionally I never ever lost even a single byte of data on my own BTRFS > > filesystems. I had a checksum failure on one of the SSDs, but BTRFS RAID 1 > > repaired it. > > > > > > Where do I use BTRFS? > > > > 1) On this ThinkPad T520 with two SSDs. /home and / in RAID 1, another > > data > > volume as single. In case you can read german, search blog.teamix.de for > > BTRFS. > > > > 2) On my music box ThinkPad T42 for /home. I did not bother to change / so > > far and may never to so for this laptop. It has a slow 2,5 inch harddisk. > > > > 3) I used it on Workstation at work as well for a data volume in RAID 1. > > But workstation is no more (not due to a filesystem failure). > > > > 4) On a server VM for /home with Maildirs and Owncloud data. /var is still > > on Ext4, but I want to migrate it as well. Whether I ever change /, I > > don´t know. > > > > 5) On another server VM, a backup VM which I currently use with > > borgbackup. > > With borgbackup I actually wouldn´t really need BTRFS, but well… > > > > 6) On *all* of my externel eSATA based backup harddisks for snapshotting > > older states of the backups. > > In other words, you are one of those who claim the opposite :) I have > also myself run btrfs for a "toy" filesystem since 2013 without any > issues, but this is more or less irrelevant since some people have > experienced data loss thanks to unstable features that are not clearly > marked as such. > And making a claim that you have not lost a single byte of data does not > make sense, how did you test this? SHA256 against a backup? :) Do you have any proof like that with *any* other filesystem on Linux? No, my claim is a bit weaker: BTRFS own scrubbing feature and well no I/O errors on rsyncing my data over to the backup drive - BTRFS checks checksum on read as well –, and yes I know BTRFS uses a weaker hashing algorithm, I think crc32c. Yet this is still more than what I can say about *any* other filesystem I used so far. Up to my current knowledge neither XFS nor Ext4/3 provide data checksumming. They do have metadata checksumming and I found contradicting information on whether XFS may support data checksumming in the future, but up to now, no *proof* *whatsoever* from side of the filesystem that the data is, what it was when I saved it initially. There may be bit errors rotting on any of your Ext4 and XFS filesystem without you even noticing for *years*. I think thats still unlikely, but it can happen, I have seen this years ago after restoring a backup with bit errors from a hardware RAID controller. Of course, I rely on the checksumming feature within BTRFS – which may have errors. But even that is more than with any other filesystem I had before. And I do not scrub daily, especially not the backup disks, but for any scrubs up to now, no issues. So, granted, my claim has been a bit bold. Right now I have no up-to-this-day scrubs so all I can say is that I am not aware of any data losses up to the point in time where I last scrubbed my devices. Just redoing the scrubbing now on my laptop. > >> The Debian wiki for BTRFS (which is recent by the way) contains a bunch > >> of warnings and recommendations and is for me a bit better than the > >> official BTRFS wiki when it comes to how to decide what features to use. > > > > Nice page. I wasn´t aware of this one. > > > > If you use BTRFS with Debian, I suggest to usually use the recent backport > > kernel, currently 4.6. > > > > Hmmm, maybe I better remove that compress=lzo mount option. Never saw any > > issue with it, tough. Will research what they say about it. > > My point exactly: You did not know about this and hence the risk of your > data being gnawed on. Well I do follow B
Re: Is stability a joke?
Am Sonntag, 11. September 2016, 10:55:21 CEST schrieb Waxhead: > I have been following BTRFS for years and have recently been starting to > use BTRFS more and more and as always BTRFS' stability is a hot topic. > Some says that BTRFS is a dead end research project while others claim > the opposite. First off: On my systems BTRFS definately runs too stable for a research project. Actually: I have zero issues with stability of BTRFS on *any* of my systems at the moment and in the last half year. The only issue I had till about half an year ago was BTRFS getting stuck at seeking free space on a highly fragmented RAID 1 + compress=lzo /home. This went away with either kernel 4.4 or 4.5. Additionally I never ever lost even a single byte of data on my own BTRFS filesystems. I had a checksum failure on one of the SSDs, but BTRFS RAID 1 repaired it. Where do I use BTRFS? 1) On this ThinkPad T520 with two SSDs. /home and / in RAID 1, another data volume as single. In case you can read german, search blog.teamix.de for BTRFS. 2) On my music box ThinkPad T42 for /home. I did not bother to change / so far and may never to so for this laptop. It has a slow 2,5 inch harddisk. 3) I used it on Workstation at work as well for a data volume in RAID 1. But workstation is no more (not due to a filesystem failure). 4) On a server VM for /home with Maildirs and Owncloud data. /var is still on Ext4, but I want to migrate it as well. Whether I ever change /, I don´t know. 5) On another server VM, a backup VM which I currently use with borgbackup. With borgbackup I actually wouldn´t really need BTRFS, but well… 6) On *all* of my externel eSATA based backup harddisks for snapshotting older states of the backups. > The Debian wiki for BTRFS (which is recent by the way) contains a bunch > of warnings and recommendations and is for me a bit better than the > official BTRFS wiki when it comes to how to decide what features to use. Nice page. I wasn´t aware of this one. If you use BTRFS with Debian, I suggest to usually use the recent backport kernel, currently 4.6. Hmmm, maybe I better remove that compress=lzo mount option. Never saw any issue with it, tough. Will research what they say about it. > The Nouveau graphics driver have a nice feature matrix on it's webpage > and I think that BTRFS perhaps should consider doing something like that > on it's official wiki as well BTRFS also has a feature matrix. The links to it are in the "News" section however: https://btrfs.wiki.kernel.org/index.php/Changelog#By_feature Thing is: This just seems to be when has a feature been implemented matrix. Not when it is considered to be stable. I think this could be done with colors or so. Like red for not supported, yellow for implemented and green for production ready. Another hint you can get by reading SLES 12 releasenotes. SUSE dares to support BTRFS since quite a while – frankly, I think for SLES 11 SP 3 this was premature, at least for the initial release without updates, I have a VM that with BTRFS I can break very easily having BTRFS say it is full, while it is has still 2 GB free. But well… this still seems to happen for some people according to the threads on BTRFS mailing list. SUSE doesn´t support all of BTRFS. They even put features they do not support behind a "allow_unsupported=1" module option: https://www.suse.com/releasenotes/x86_64/SUSE-SLES/12/#fate-314697 But they even seem to contradict themselves by claiming they support RAID 0, RAID 1 and RAID10, but not RAID 5 or RAID 6, but then putting RAID behind that module option – or I misunderstood their RAID statement "Btrfs is supported on top of MD (multiple devices) and DM (device mapper) configurations. Use the YaST partitioner to achieve a proper setup. Multivolume Btrfs is supported in RAID0, RAID1, and RAID10 profiles in SUSE Linux Enterprise 12, higher RAID levels are not yet supported, but might be enabled with a future service pack." and they only support BTRFS on MD for RAID. They also do not support compression yet. They even do not support big metadata. https://www.suse.com/releasenotes/x86_64/SUSE-SLES/12/#fate-317221 Interestingly enough RedHat only supports BTRFS as a technology preview, even with RHEL 7. > For example something along the lines of (the statuses are taken > our of thin air just for demonstration purposes) I´d say feel free to work with the feature matrix already there and fill in information about stability. I think it makes sense tough to discuss first on how to do it with still keeping it manageable. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kworker threads may be working saner now instead of using 100% of a CPU core for minutes (Re: Still not production ready)
Am Mittwoch, 7. September 2016, 11:53:04 CEST schrieb Christian Rohmann: > On 03/20/2016 12:24 PM, Martin Steigerwald wrote: > >> btrfs kworker thread uses up 100% of a Sandybridge core for minutes on > >> > >> > random write into big file > >> > https://bugzilla.kernel.org/show_bug.cgi?id=90401 > > > > I think I saw this up to kernel 4.3. I think I didn´t see this with 4.4 > > anymore and definately not with 4.5. > > > > So it may be fixed. > > > > Did anyone else see kworker threads using 100% of a core for minutes with > > 4.4 / 4.5? > > I run 4.8rc5 and currently see this issue. kworking has been running at > 100% for hours now, seems stuck there. > > Anything I should look at in order to narrow this down to a root cause? I didn´t see any issues since my last post, currently running 4.8-rc5 myself. I suggest you look at kernel log and probably review this thread and my bug report for what other information I came up with. Particulary in my case the issue only happened when BTRFS allocated all device spaces into chunks, but the space in the chunks was not fully used up yet. I.e. when BTRFS had to seek for new space in chunks and couldn´t just allocate a new chunk anymore. In addition to that your BTRFS configuration, storage configuration, yada. Just review what I reported to get an idea. If you are sufficiently sure that your issue is the same from looking at the kernel log… so if the backtraces look sufficiently similar, then I´d add to my bug report. Otherwise I´d hope a new one. Good luck. -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dd on wrong device, 1.9 GiB from the beginning has been overwritten, how to restore partition?
Hi Maximilian, On Sonntag, 12. Juni 2016 23:22:11 CEST Maximilian Böhm wrote: > Hi there, I did something terribly wrong, all blame on me. I wanted to > write to an USB stick but /dev/sdc wasn't the stick in this case but > an attached HDD with GPT and an 8 TB btrfs partition… > > $ sudo dd bs=4M if=manjaro-kde-16.06.1-x86_64.iso of=/dev/sdc > 483+1 Datensätze ein > 483+1 Datensätze aus > 2028060672 bytes (2,0 GB, 1,9 GiB) copied, 16,89 s, 120 MB/s > > So, shit. > > $ sudo btrfs check --repair /dev/sdc > enabling repair mode > No valid Btrfs found on /dev/sdc > Couldn't open file system > > $ sudo btrfs-find-root /dev/sdc > No valid Btrfs found on /dev/sdc > ERROR: open ctree failed > > $ sudo btrfs-show-super /dev/sdc --all > superblock: bytenr=65536, device=/dev/sdc > - > ERROR: bad magic on superblock on /dev/sdc at 65536 > > superblock: bytenr=67108864, device=/dev/sdc > - > ERROR: bad magic on superblock on /dev/sdc at 67108864 > > superblock: bytenr=274877906944, device=/dev/sdc > - > ERROR: bad magic on superblock on /dev/sdc at 274877906944 > > > System infos: > > $ uname -a > Linux Mongo 4.6.2-1-MANJARO #1 SMP PREEMPT Wed Jun 8 11:00:08 UTC 2016 > x86_64 GNU/Linux > > $ btrfs --version > btrfs-progs v4.5.3 > > Don't think dmesg is necessary here. > > > OK, the btrfs wiki says there is a second superblock at 64 MiB > (overwritten too in my case) and a third at 256 GiB ("0x40"). > But how to restore it? And how to restore the general btrfs header > metadata? How to restore GPT without doing something terrible again? > Would be glad for any help! But it says bad magic on that one as well. Well, no idea if there is any chance to fix BTRFS in this situation. Does btrfs restore do anything useful to copy of what if can find from this device? It does not work in place, so you need an additional space to let it restore to. If BTRFS cannot be salvaged… you can still have a go with photorec, but it will not recover filenames and directory structure, just the data of any file of a file in a known format that it finds in one piece. I suspect you have no backup. So *good* luck. I do think tough that dd should just bail out or warn for a BTRFS filesystem that is still mounted, or wasn´t it mounted at that time? I also think it would be good to add an existing filesystem check just like in mkfs.btrfs, mkfs.xfs and so on. I´d like that, but that would be a suggestions for the coreutils people. Yes, Unix is for people who know what they are doing… unless they don´t. And in the end even one of the most experienced admin could make such a mistake. Goodnight, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFE: 'btrfs' tools machine readable output
Hello Richard, On Montag, 16. Mai 2016 13:14:56 CEST Richard W.M. Jones wrote: > I don't have time to implement this right now, so I'm just posting > this as a suggestion/request ... > > It would be really helpful if the btrfs tools had a machine-readable > output. > > Libguestfs parses btrfs tools output in a number of places, eg: > https://github.com/libguestfs/libguestfs/blob/master/daemon/btrfs.c > This is a massive PITA because each time a new release of btrfs-progs > comes along it changes the output slightly, and we end up having > to add all sorts of hacks. > > With machine-readable output, there'd be a flag which would > change the output. eg: I wonder whether parsing a text based output is really the most elegant method here. How about a libbtrfs so that other tools can benefit from btrfs tools functionality? This was also desktop environments wishing to make use of snapshot functionality or advanced disk usage reporting for example can easily make use of it without calling external commands. Of course it would likely me more effort than to implement structured output. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send/receive using generation number as source
On Freitag, 8. April 2016 11:12:54 CEST Hugo Mills wrote: > On Fri, Apr 08, 2016 at 01:01:03PM +0200, Martin Steigerwald wrote: > > Hello! > > > > As far as I understood, for differential btrfs send/receive – I didn´t use > > it yet – I need to keep a snapshot on the source device to then tell > > btrfs send to send the differences between the snapshot and the current > > state. > > > > Now the BTRFS filesystems on my SSDs are often quite full, thus I do not > > keep any snapshots except for one during rsync or borgbackup script > > run-time. > > > > Is it possible to tell btrfs send to use generation number xyz to > > calculate > > the difference? This way, I wouldn´t have to keep a snapshot around, I > > believe. > >btrfs sub find-new > >BUT that will only tell you which files have been added or updated. > It won't tell you which files have been deleted. It's also unrelated > to send/receive, so you'd have to roll your own solution. I am aware of this one. > > I bet not, at the time cause -c wants a snapshot. Ah and it wants a > > snapshot of the same state on the destination as well. Well on the > > destination I let the script make a snapshot after the backup so… > > what I would need is to remember the generation number of the source > > snapshot that the script creates to backup from and then tell btrfs > > send that generation number + the destination snapshots. > > > > Well, or get larger SSDs or get rid of some data on them. > >Those are the other options, of course. Hm, I see. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs send/receive using generation number as source
Hello! As far as I understood, for differential btrfs send/receive – I didn´t use it yet – I need to keep a snapshot on the source device to then tell btrfs send to send the differences between the snapshot and the current state. Now the BTRFS filesystems on my SSDs are often quite full, thus I do not keep any snapshots except for one during rsync or borgbackup script run-time. Is it possible to tell btrfs send to use generation number xyz to calculate the difference? This way, I wouldn´t have to keep a snapshot around, I believe. I bet not, at the time cause -c wants a snapshot. Ah and it wants a snapshot of the same state on the destination as well. Well on the destination I let the script make a snapshot after the backup so… what I would need is to remember the generation number of the source snapshot that the script creates to backup from and then tell btrfs send that generation number + the destination snapshots. Well, or get larger SSDs or get rid of some data on them. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: csum errors in VirtualBox VDI files
On Dienstag, 22. März 2016 09:03:42 CEST Kai Krakow wrote: > Hello! > > Since one of the last kernel updates (I don't know which exactly), I'm > experiencing csum errors within VDI files when running VirtualBox. A > side effect of this is, as soon as dmesg shows these errors, commands > like "du" and "df" hang until reboot. > > I've now restored the file from backup but it happens over and over > again. Just as another data point I am irregularily using a VM with Virtualbox in a VDI file on a BTRFS RAID 1 on two SSDs and had no such issues so far up to kernel 4.5. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
On Dienstag, 15. März 2016 08:07:22 CEST Marc Haber wrote: > On Mon, Mar 14, 2016 at 09:39:51PM +0100, Henk Slager wrote: > > >> BTW, I restored and mounted your 20160307-fanbtr-image: > > >> > > >> [266169.207952] BTRFS: device label fanbtr devid 1 transid 22215732 > > >> /dev/loop0 [266203.734804] BTRFS info (device loop0): disk space > > >> caching is enabled [266203.734806] BTRFS: has skinny extents > > >> [266204.022175] BTRFS: checking UUID tree > > >> [266239.407249] attempt to access beyond end of device > > >> [266239.407252] loop0: rw=1073, want=715202688, limit=70576 > > >> [266239.407254] BTRFS error (device loop0): bdev /dev/loop0 errs: wr > > >> 1, rd 0, flush 0, corrupt 0, gen 0 > > >> [266239.407272] attempt to access beyond end of device > > >> .. and 16 more > > >> > > >> As a quick fix/workaround, I truncated the image to 1T > > > > > > The original fs was 417 GiB in size. What size does the image claim? > > > > ls -alFh of the restored image showed 337G I remember. > > btrfs fi us showed also a number over 400G, I don't have the > > files/loopdev anymore. > > sounds legit. > > > It could some side effect of btrfs-image, I only have used it for > > multi-device, where dev id's are ignore, but total image size did not > > lead to problems. > > The original "ofanbtr" seems to have a problem, since btrfs check > > /media/tempdisk says: > > > [10/509]mh@fan:~$ sudo btrfs check /media/tempdisk/ > > > Superblock bytenr is larger than device size > > > Couldn't open file system > > > [11/509]mh@fan:~$ > > > > > > Can this be fixed? > > > > What I would do in order to fix it, is resize the fs to let's say > > 190GiB. That should write correct values to the superblocks I /hope/. > > And then resize back to max. > > It doesn't: > [20/518]mh@fan:~$ sudo btrfs filesystem resize 300G /media/tempdisk/ > Resize '/media/tempdisk/' of '300G' > [22/520]mh@fan:~$ sudo btrfs check /media/tempdisk/ > Superblock bytenr is larger than device size > Couldn't open file system > [23/521]mh@fan:~$ df -h Are you trying the check on the *mounted* filesystem? "media/tempdisk" appears to be a mount point, not a device file. Unmount it and use the / one device file of the filesystem to check. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unable to mount btrfs partition, please help :(
On Sonntag, 20. März 2016 10:18:26 CET Patrick Tschackert wrote: > > I think in retrospect the safe way to do these kinds of Virtual Box > > updates, which require kernel module updates, would have been to > > shutdown the VM and stop the array. *shrug* > > > After this, I think I'll just do away with the virtual machine on this host, > as the app contained in that vm can also run on the host. I tried to be > fancy, and it seems to needlessly complicate things. I am not completely sure and I have no exact reference anymore, but I think I read more than once about fs benchmarks running faster in Virtualbox than on the physical system, which may point at an at least incomplete fsync() implementation for writing into Virtualbox image files. I never found any proof of this nor did I specificially seeked to research it. So it may be true or not. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unable to mount btrfs partition, please help :(
On Samstag, 19. März 2016 19:34:55 CET Chris Murphy wrote: > >>> $ uname -a > >>> Linux vmhost 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u4 > >>> (2016-02-29) x86_64 GNU/Linux > >> > >>This is old. You should upgrade to something newer, ideally 4.5 but > >>4.4.6 is good also, and then oldest I'd suggest is 4.1.20. > >> > > Shouldn't I be able to get the newest kernel by executing "apt-get update > > && apt-get dist-upgrade"? That's what I ran just now, and it doesn't > > install a newer kernel. Do I really have to manually upgrade to a newer > > one? > I'm not sure. You might do a list search for debian, as I know debian > users are using newer kernels that they didn't build themselves. Try a backport¹ kernel. Add backports and do apt-cache search linux-image I use 4.3 backport kernel successfully on two server VMs which use BTRFS. [1] http://backports.debian.org/ Thx, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Experimental btrfs encryption
On Mittwoch, 2. März 2016 09:06:57 CET Qu Wenruo wrote: > And maybe I just missed something, but the filename seems not touched, > meaning it will leak a lot of information. > Just like default eCryptfs behavior. > > I understand that's an easy design and it's not a high priority thing, > but I hope we can encrypt the subvolume tree blocks too, if using > per-subvolume policy. > To provide a feature near block-level encryption. I´d really love an approach to at least optionally be able to hide the metadata structure completely except for which blocks on the block device are allocated. I.e. not just encrypting filenames, but encrypting the directory structure, amount of files, their dates, their sizes. I am not sure whether BTRFS can allow this and still be at least btrfs check´able without unlocking the encryption key. Ideally this could even be backuped by an btrfs send/ receive as a kind opaque stream. This would excel BTRFS encryption support over anything thats available with Ext4, F2FS, ecryptfs and encfs. It would ideal for having encryption on SSD, no need to encrypted unallocated blocks, but still most of the advantages of block level encryption, even of some would argue that you can find something out when you check which blocks are allocated or not, and of course the total size of the subvolume and which chunks it allocates are known. I would the this as requirement for any initial approach and be happy about anything that does file name encryption like ecryptfs or the Ext4/F2FS approach, but if the subvolume specifics of BTRFS can be used to encrypted more of the metadata then even better! Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kworker threads may be working saner now instead of using 100% of a CPU core for minutes (Re: Still not production ready)
On Sonntag, 13. Dezember 2015 23:35:08 CET Martin Steigerwald wrote: > Hi! > > For me it is still not production ready. Again I ran into: > > btrfs kworker thread uses up 100% of a Sandybridge core for minutes on > random write into big file > https://bugzilla.kernel.org/show_bug.cgi?id=90401 I think I saw this up to kernel 4.3. I think I didn´t see this with 4.4 anymore and definately not with 4.5. So it may be fixed. Did anyone else see kworker threads using 100% of a core for minutes with 4.4 / 4.5? For me this would be a big step forward. And yes, I am aware some people have new and other issues, but well for me a non working balance – it may also be broken here with "no space left on device", it errored out often enough here – is still something different than having to switch off the device hard unless you want to give it a ton of time to eventually shutdown which is not an option if you just want to work with your system. In any case many thanks to all the developers working on improving BTRFS, and especially those who bring in bug fixes. I do think BTRFS still needs more stability work when I read through the recent mailing list threads. Thanks, Martin > No matter whether SLES 12 uses it as default for root, no matter whether > Fujitsu and Facebook use it: I will not let this onto any customer machine > without lots and lots of underprovisioning and rigorous free space > monitoring. Actually I will renew my recommendations in my trainings to be > careful with BTRFS. > > From my experience the monitoring would check for: > > merkaba:~> btrfs fi show /home > Label: 'home' uuid: […] > Total devices 2 FS bytes used 156.31GiB > devid1 size 170.00GiB used 164.13GiB path /dev/mapper/msata-home > devid2 size 170.00GiB used 164.13GiB path /dev/mapper/sata-home > > If "used" is same as "size" then make big fat alarm. It is not sufficient > for it to happen. It can run for quite some time just fine without any > issues, but I never have seen a kworker thread using 100% of one core for > extended period of time blocking everything else on the fs without this > condition being met. > > > In addition to that last time I tried it aborts scrub any of my BTRFS > filesstems. Reported in another thread here that got completely ignored so > far. I think I could go back to 4.2 kernel to make this work. > > > I am not going to bother to go into more detail on any on this, as I get the > impression that my bug reports and feedback get ignored. So I spare myself > the time to do this work for now. > > > Only thing I wonder now whether this all could be cause my /home is already > more than one and a half year old. Maybe newly created filesystems are > created in a way that prevents these issues? But it already has a nice > global reserve: > > merkaba:~> btrfs fi df / > Data, RAID1: total=27.98GiB, used=24.07GiB > System, RAID1: total=19.00MiB, used=16.00KiB > Metadata, RAID1: total=2.00GiB, used=536.80MiB > GlobalReserve, single: total=192.00MiB, used=0.00B > > > Actually when I see that this free space thing is still not fixed for good I > wonder whether it is fixable at all. Is this an inherent issue of BTRFS or > more generally COW filesystem design? > > I think it got somewhat better. It took much longer to come into that state > again than last time, but still, blocking like this is *no* option for a > *production ready* filesystem. > > > > I am seriously consider to switch to XFS for my production laptop again. > Cause I never saw any of these free space issues with any of the XFS or > Ext4 filesystems I used in the last 10 years. > > Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Again, no space left on device while rebalancing and recipe doesnt work
On Samstag, 27. Februar 2016 22:14:50 CET Marc Haber wrote: > Hi, Hi Marc. > I have again the issue of no space left on device while rebalancing > (with btrfs-tools 4.4.1 on kernel 4.4.2 on Debian unstable): > > mh@fan:~$ sudo btrfs balance start /mnt/fanbtr > ERROR: error during balancing '/mnt/fanbtr': No space left on device > mh@fan:~$ sudo btrfs fi show /mnt/fanbtr > mh@fan:~$ sudo btrfs fi show -m > Label: 'fanbtr' uuid: 4198d1bc-e3ce-40df-a7ee-44a2d120bff3 > Total devices 1 FS bytes used 116.49GiB > devid1 size 417.19GiB used 177.06GiB path /dev/mapper/fanbtr Hmmm, thats still a ton of space to allocate chunks from. > mh@fan:~$ sudo btrfs fi df /mnt/fanbtr > Data, single: total=113.00GiB, used=112.77GiB > System, DUP: total=32.00MiB, used=48.00KiB > Metadata, DUP: total=32.00GiB, used=3.72GiB > GlobalReserve, single: total=512.00MiB, used=0.00B > mh@fan:~$ > > The filesystem was recently resized from 300 GB to 420 GB. > > Why does btrfs fi show /mnt/fanbtr not give any output? Wy does btrfs > fi df /mnt/fanbtr say that my data space is only 113 GiB large? Cause it is. The "used" in "devid 1" line is btrfs fi sh is "data + 2x system + 2x metadata = 113 GiB + 2 * 32 GiB + 2 * 32 MiB, i.e. what amount of the size of the device is allocated for chunks. The value one line above is what is allocated inside the chunks. I.e. the line in "devid 1" is "total" of btrfs fi df summed up, and the line above is "used" in btrfs fi df summed up. And… with more devices you have more fun. I suggest: merkaba:~> btrfs fi usage -T /daten Overall: Device size: 235.00GiB Device allocated:227.04GiB Device unallocated:7.96GiB Device missing: 0.00B Used:225.84GiB Free (estimated): 8.48GiB (min: 8.48GiB) Data ratio: 1.00 Metadata ratio: 1.00 Global reserve: 128.00MiB (used: 0.00B) Data Metadata System Id Path singlesinglesingle Unallocated -- - - - --- 1 /dev/dm-1 226.00GiB 1.01GiB 32.00MiB 7.96GiB -- - - - --- Total 226.00GiB 1.01GiB 32.00MiB 7.96GiB Used 225.48GiB 371.83MiB 48.00KiB as that is much clearer to read IMHO. and merkaba:~> btrfs device usage /daten /dev/dm-1, ID: 1 Device size: 235.00GiB Data,single: 226.00GiB Metadata,single: 1.01GiB System,single: 32.00MiB Unallocated: 7.96GiB (although thats include in the filesystem usage output) Or for a BTRFS RAID 1: merkaba:~> btrfs fi usage -T /home Overall: Device size: 340.00GiB Device allocated:340.00GiB Device unallocated:2.00MiB Device missing: 0.00B Used:306.47GiB Free (estimated): 14.58GiB (min: 14.58GiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data Metadata System Id Path RAID1 RAID1RAID1Unallocated -- - - --- 1 /dev/dm-0 163.94GiB 6.03GiB 32.00MiB 1.00MiB 2 /dev/dm-3 163.94GiB 6.03GiB 32.00MiB 1.00MiB -- - - --- Total 163.94GiB 6.03GiB 32.00MiB 2.00MiB Used 149.36GiB 3.88GiB 48.00KiB merkaba:~> btrfs device usage /home /dev/dm-0, ID: 1 Device size: 170.00GiB Data,RAID1:163.94GiB Metadata,RAID1: 6.03GiB System,RAID1: 32.00MiB Unallocated: 1.00MiB /dev/dm-3, ID: 2 Device size: 170.00GiB Data,RAID1:163.94GiB Metadata,RAID1: 6.03GiB System,RAID1: 32.00MiB Unallocated: 1.00MiB (this is actually the situation asking for hung task trouble with kworker threads seeking for free space inside chunks, as no new chunks can be allocated, lets hope kernel 4.4 finally really has fixes for this) > btrfs balance start -dusage=5 works up to -dusage=100: > > mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr > Done, had to relocate 111 out of 179 chunks > mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr > Done, had to relocate 111 out of 179 chunks > mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr > Done, had to relocate 110 out of 179 chunks > mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr > Done, had to relocate 109 out of 179 chunks > mh@fan:~$ sudo btrfs balance start /mnt/fanbtr > ERROR: error during balancing '/mnt/fanbtr': No space left on device > mh@fan:~$ > > What is going on here? How do I get away from here? Others may have
Re: Use fast device only for metadata?
Am Sonntag, 7. Februar 2016, 21:07:13 CET schrieb Kai Krakow: > Am Sun, 07 Feb 2016 11:06:58 -0800 > > schrieb Nikolaus Rath: > > Hello, > > > > I have a large home directory on a spinning disk that I regularly > > synchronize between different computers using unison. That takes ages, > > even though the amount of changed files is typically small. I suspect > > most if the time is spend walking through the file system and checking > > mtimes. > > > > So I was wondering if I could possibly speed-up this operation by > > storing all btrfs metadata on a fast, SSD drive. It seems that > > mkfs.btrfs allows me to put the metadata in raid1 or dup mode, and the > > file contents in single mode. However, I could not find a way to tell > > btrfs to use a device *only* for metadata. Is there a way to do that? > > > > Also, what is the difference between using "dup" and "raid1" for the > > metadata? > > You may want to try bcache. It will speedup random access which is > probably the main cause for your slow sync. Unfortunately it requires > you to reformat your btrfs partitions to add a bcache superblock. But > it's worth the efforts. > > I use a nightly rsync to USB3 disk, and bcache reduced it from 5+ hours > to typically 1.5-3 depending on how much data changed. An alternative is using dm-cache, I think it doesn´t need to recreate the filesystem. I wonder what happened to the VFS hot data tracking stuff patchset floating around here quite some time ago. -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-progs and btrfs(8) inconsistencies
Am Donnerstag, 4. Februar 2016, 09:57:54 CET schrieb Moviuro: > > Although personally I like to let all the backward compatibility > > things go hell, but that's definitely not how things work. :( > > > > 2) End-user taste. > > Some end-users like such info as feedback of success. > > Of course other users like it act as silent as possible. > > I'm pretty sure that's... not the case. Almost everything on GNU/Linux > is silent. cd(1) is silent, cp(1) is silent, rm(1)... > What they all have though is a -v|--verbose switch. The various mkfs commands are not. Not one of them I know of. Additionally each one gives a different output. pvcreate, vgcreate, lvcreate and as well as the remove commands and probably other LVM commands are not (no one could argue, that from their ideas they come from HP/UX, but thats a Unix as well): merkaba:~> lvcreate -L 1G -n bla sata Logical volume "bla" created. And I think, not testing right now, that also mdadm is not silent on creating a softraid. So while I agree with you that regular shell commands (coreutils, util-linux) are silent on success usually this does not appear to be the case with storage related commands in GNU/Linux. I don´t have a clear oppinion about it other than I´d like to see some standard too. coreutils / util-linux both them to have some kind of standard, although not necessarily the same standard I bet. And I am not sure whether it is documented somewhere. -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs Check - "type mismatch with chunk"
Am Dienstag, 5. Januar 2016, 15:34:35 CET schrieb Duncan: > Christoph Anton Mitterer posted on Sat, 02 Jan 2016 06:12:46 +0100 as > > excerpted: > > On Fri, 2015-12-25 at 08:06 +, Duncan wrote: > >> I wasn't personally sure if 4.1 itself was affected or not, but the > >> wiki says don't use 4.1.1 as it's broken with this bug, with the > >> quick-fix in 4.1.2, so I /think/ 4.1 itself is fine. A scan with a > >> current btrfs check should tell you for sure. But if you meant 4.1.1 > >> and only typed 4.1, then yes, better redo. > > > > What exactly was that bug in 4.1.1 mkfs and how would one notice that > > one suffers from it? > > I created a number of personal filesystems that I use "productively" and > > I'm not 100% sure during which version I've created them... :/ > > > > > > > > Is there some easy way to find out, like a fs creation time stamp?? > > I believe a current btrfs check will flag the errors, but can't fix them, > as the problem was in the filesystem creation and is simply too deep to > fix, so the bad filesystems must be wiped and recreated with a mkfs.btrfs > without the bug, to fix. btrfs check from btrfs tools 4.3.1 on kernel 4.4-rc6 has not been able to fix these errors and I recreated the filesystem that had the errors. I think I mentioned it also in this thread. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs scrub failing
Am Sonntag, 3. Januar 2016, 17:33:03 CET schrieb John Center: > Hi Martin, Hi John, > One thing I forgot, I did run btrfs-image & it appears to have successfully > completed afaict. Do you think it would be useful to someone for future > troubleshooting? I leave that to the devs to decide. Maybe, if its not too large you can keep it for a while and ask the devs whether they want it. But, as it contains the type mismatch thing that they already know and fixed in mkfs.btrfs, if may not be of much use to them. Thank you, Martin > > Thanks. > > -John > > Sent from my iPhone > > > On Jan 3, 2016, at 5:06 AM, Martin Steigerwald <mar...@lichtvoll.de> > > wrote: > > > > Am Sonntag, 3. Januar 2016, 02:02:12 CET schrieb John Center: > >> Hi Martin & Duncan, > > > > Hi John, > > > >> Since I had a backup of my data, I first ran "btrfs check -p" on the > >> unmounted array. It first found 3 parent transid errors: > >> > >> root@ubuntu:~# btrfs check -p /dev/md126p2 > >> Checking filesystem on /dev/md126p2 > >> UUID: 9b5a6959-7df1-4455-a643-d369487d24aa > >> parent transid verify failed on 97763328 wanted 33736296 found 181864 > >> ... > >> Ignoring transid failure > >> parent transid verify failed on 241287168 wanted 33554449 found 17 > >> ... > >> Ignoring transid failure > >> parent transid verify failed on 1016217600 wanted 33556071 found 1639 > >> ... > >> Ignoring transid failure > >> > >> Then a huge number of bad extent mismatches: > >> > >> bad extent [29360128, 29376512), type mismatch with chunk > >> bad extent [29376512, 29392896), type mismatch with chunk > >> ... > >> bad extent [1039947448320, 1039947464704), type mismatch with chunk > >> bad extent [1039948005376, 1039948021760), type mismatch with chunk > > > > Due to these I recommend you redo the BTRFS filesystem using your backup. > > See the other thread where Duncan explained the situation that this may > > be a sign of a filesystem corruption introduced by a faulty mkfs.btrfs > > version. > > > > I had this yesterday with one of my BTRFS filesystems and these type > > mismatch things didn´t go away with btrfs check --repair from btrfs-tools > > 4.3.1. > > > > Also > > > >> Next: > >> > >> Couldn't find free space inode 1 > >> checking free space cache [o] > >> parent transid verify failed on 241287168 wanted 33554449 found 17 > >> Ignoring transid failure > >> checkingunresolved ref dir 418890 index 0 namelen 15 name umq-onetouch.ko > >> filetype 1 errors 6, no dir index, no inode ref > >> > >>unresolved ref dir 418890 index 8 namelen 15 name ums-onetouch.ko > >> > >> filetype 1 errors 1, no dir item > > > > the further errors and > > > > […] > > > >> Once it finished, I tried a recovery mount, which went ok. Since I > >> already > >> had a backup of my data, I tried to run btrfs repair: > >> […] > >> Then it got stuck on the same error as before. It appears to be a loop: > >> > >> parent transid verify failed on 1016217600 wanted 33556071 found 1639 > >> Ignoring transid failure > >> parent transid verify failed on 1016217600 wanted 33556071 found 1639 > >> Ignoring transid failure > >> ... > > > > […] > > > >> It's been running this way for over an hour now, never moving on from the > >> same errors & the same couple of files. I'm going to let it run > >> overnight, > >> but I don't have a lot of confidence that it will ever exit this loop. > >> Any > >> recommendations as what I should do next? > > > > is a clear sign to me that it likely is more effective to just redo the > > filesystem from scratch than trying to repair it with the limited > > capabilities of current btrfs check command. > > > > So when you have a good backup of your data and want to be confident of a > > sound structure of the filesytem, redo it from scratch with latest > > btrfs-tools 4.3.1. > > > > Thats at least my take on this. > > > > Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs scrub failing
Am Samstag, 2. Januar 2016, 18:27:16 CET schrieb John Center: > Hi Martin, > > > On Jan 2, 2016, at 6:41 AM, Martin Steigerwald <mar...@lichtvoll.de> > > wrote: > > Am Samstag, 2. Januar 2016, 11:35:51 CET schrieb Martin Steigerwald: > >> Am Freitag, 1. Januar 2016, 20:04:43 CET schrieb John Center: > >>> Hi Duncan, > >>> > >>>> On Fri, Jan 1, 2016 at 12:05 PM, Duncan <1i5t5.dun...@cox.net> wrote: > >>> > >>>> John Center posted on Fri, 01 Jan 2016 11:41:20 -0500 as excerpted: > >>> Where do I go from here? > >> > >> These and the other errors point at an issue with the filesystem > > structure. > > >> As I never had to deal with that, I can only give generic advice: > >> > >> 1) Use latest stable btrfs-progs. > > I'm in the process of creating a live USB to boot with. Since I'm running > mdadm (imsm) I need to purge dmraid & install mdadm to assemble the drives > first. I also need to put the latest version of btrfs-progs on it, too. > (As a side note, things have been getting flaky with my workstation, so I > guess I'm either going to fix this or rebuild it. I have the data files > backed up, it's just a pain to have to recreate the system again.) I think you could even just run a GRML from USB stick and grab the sources via git clone and compile them there. Shouldn´t take long. I use: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git which has 4.3.1. > >> 2) Umount the filesystem and run > >> > >> btrfs check (maybe with -p) > >> > >> When it finds some errors, proceed with the following steps: > >> > >> Without --repair or some of the other options that modify things it is > > read > > >> only. > >> > >> 3) If you can still access all the files, first thing to do is: rsync or > >> otherwise backup them all to a different location, before attempting > >> anything to repair the issue. > >> > >> 4) If you can´t access some files, you may try to use btrfs restore for > >> restoring them. > >> > >> 5) Then, if you made sure you have an up-to-date backup run > >> > >> btrfs --repair > > > > Before doing that, review: > > > > https://btrfs.wiki.kernel.org/index.php/Btrfsck > > > > to learn about other options. > > Ok, so "btrfs check -p" first to understand how bad the filesystem is > corrupted. > Should I then try to do a recovery mount, or should I run "btrfs check > --repair -p" to try & fix it? I'm not sure what a "mount -o ro,recovery" > does. Well, if "btrfs check -p" confirms that something needs fixing, you make sure you have a working backup *first*, either by rsync or btrfs restore if some files are inaccesible, but from what I remember you are able to access all files? Then I´d say give btrfs check --repair a try. I don´t know much about that mount -o ro,recovery does but from what I gathered so far I thought it is only needed when you *can´t* mount the filesystem anymore. > If I have to reformat & reinstall Ubuntu, are there any recommended > mkfs.btrfs options I should use? Something that might help prevent > problems in the future? Lol. :) As far as I see mkfs.btrfs from btrfs-tools 4.3.1 already sets extref and skinny-metadata as well as 16 KiB node and leaf size by default, so as I recreated my /daten BTRFS filesystem due to the extent type mismatch errors (see other thread) I used use mkfs.btrfs -L daten to set a label. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs scrub failing
Am Sonntag, 3. Januar 2016, 02:02:12 CET schrieb John Center: > Hi Martin & Duncan, Hi John, > Since I had a backup of my data, I first ran "btrfs check -p" on the > unmounted array. It first found 3 parent transid errors: > > root@ubuntu:~# btrfs check -p /dev/md126p2 > Checking filesystem on /dev/md126p2 > UUID: 9b5a6959-7df1-4455-a643-d369487d24aa > parent transid verify failed on 97763328 wanted 33736296 found 181864 > ... > Ignoring transid failure > parent transid verify failed on 241287168 wanted 33554449 found 17 > ... > Ignoring transid failure > parent transid verify failed on 1016217600 wanted 33556071 found 1639 > ... > Ignoring transid failure > > Then a huge number of bad extent mismatches: > > bad extent [29360128, 29376512), type mismatch with chunk > bad extent [29376512, 29392896), type mismatch with chunk > ... > bad extent [1039947448320, 1039947464704), type mismatch with chunk > bad extent [1039948005376, 1039948021760), type mismatch with chunk Due to these I recommend you redo the BTRFS filesystem using your backup. See the other thread where Duncan explained the situation that this may be a sign of a filesystem corruption introduced by a faulty mkfs.btrfs version. I had this yesterday with one of my BTRFS filesystems and these type mismatch things didn´t go away with btrfs check --repair from btrfs-tools 4.3.1. Also > Next: > > Couldn't find free space inode 1 > checking free space cache [o] > parent transid verify failed on 241287168 wanted 33554449 found 17 > Ignoring transid failure > checkingunresolved ref dir 418890 index 0 namelen 15 name umq-onetouch.ko > filetype 1 errors 6, no dir index, no inode ref > unresolved ref dir 418890 index 8 namelen 15 name ums-onetouch.ko > filetype 1 errors 1, no dir item the further errors and […] > Once it finished, I tried a recovery mount, which went ok. Since I already > had a backup of my data, I tried to run btrfs repair: > […] > Then it got stuck on the same error as before. It appears to be a loop: > > parent transid verify failed on 1016217600 wanted 33556071 found 1639 > Ignoring transid failure > parent transid verify failed on 1016217600 wanted 33556071 found 1639 > Ignoring transid failure > ... […] > It's been running this way for over an hour now, never moving on from the > same errors & the same couple of files. I'm going to let it run overnight, > but I don't have a lot of confidence that it will ever exit this loop. Any > recommendations as what I should do next? is a clear sign to me that it likely is more effective to just redo the filesystem from scratch than trying to repair it with the limited capabilities of current btrfs check command. So when you have a good backup of your data and want to be confident of a sound structure of the filesytem, redo it from scratch with latest btrfs-tools 4.3.1. Thats at least my take on this. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unrecoverable fs corruption?
Am Sonntag, 3. Januar 2016, 15:53:56 CET schrieben Sie: > [1] Fat-fingering a deletion: My own brown-bag "I became an admin that > day" case was running a script, unfortunately as root, that I was > debugging, where I did an rm -rf $somevar/*, with $somevar assigned > earlier, only either the somevar in the assignment or the somevar in the > rm line was typoed, so the var ended up empty and the command ended up as > rm -rf /*. ... > > I was *SO* glad I had a backup, not just a raid1, that day! Epic. Thats the one case GNU rm doesn´t cover yet. It refuses to rm -rf . or rm -rf .. and rm -rf / (unless you give special argument, but there is not much it can do about rm -r /*, as the shell expands this before handing it to the command. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs scrub failing
Am Freitag, 1. Januar 2016, 20:04:43 CET schrieb John Center: > Hi Duncan, > > On Fri, Jan 1, 2016 at 12:05 PM, Duncan <1i5t5.dun...@cox.net> wrote: > > John Center posted on Fri, 01 Jan 2016 11:41:20 -0500 as excerpted: > >> If this doesn't resolve the problem, what would you recommend my next > >> steps should be? I've been hesitant to run too many of the btrfs-tools, > >> mainly because I don't want to accidentally screw things up & I don't > >> always know how to interpret the results. (I ran btrfs-debug-tree, > >> hoping something obvious would show up. Big mistake. ) > > > > LOLed at that debug-tree remark. Been there (with other tools) myself. > > > > Well, I'm hoping someone who had the problem can confirm whether it's > > fixed in current kernels (scrub is one of those userspace commands that's > > mostly just a front-end to the kernel code which does the real work, so > > kernel version is the important thing for scrub). I'm guessing so, and > > that you'll find the problem gone in 4.3. > > > > We'll cross the not-gone bridge if we get to it, but again, if the other > > people who had the similar problem can confirm whether it disappeared for > > them with the new kernel, it would help a lot, as there were enough such > > reports that if it's the same problem and still there for everyone (which > > I doubt as I expect there'd still be way more posts about it if so, but > > confirmation's always good), nothing to do but wait for a fix, while if > > not, and you still have your problem, then it's a different issue and the > > devs will need to work with you on a fix specific to your problem. > > Ok, I'm at the next bridge. :-( I upgraded the kernel to 4.4rc7 from > the Ubuntu Mainline archive & I just ran the scrub: > > john@mariposa:~$ sudo /sbin/btrfs scrub start -BdR /dev/md125p2 > ERROR: scrubbing /dev/md125p2 failed for device id 1: ret=-1, errno=5 > (Input/output error) > scrub device /dev/md125p2 (id 1) canceled > scrub started at Fri Jan 1 19:38:21 2016 and was aborted after 00:02:34 > data_extents_scrubbed: 111031 > tree_extents_scrubbed: 104061 > data_bytes_scrubbed: 2549907456 > tree_bytes_scrubbed: 1704935424 > read_errors: 0 > csum_errors: 0 > verify_errors: 0 > no_csum: 1573 > csum_discards: 0 > super_errors: 0 > malloc_errors: 0 > uncorrectable_errors: 0 > unverified_errors: 0 > corrected_errors: 0 > last_physical: 4729667584 > > I checked dmesg & this appeared: > > [11428.983355] BTRFS error (device md125p2): parent transid verify > failed on 241287168 wanted 33554449 found 17 > [11431.028399] BTRFS error (device md125p2): parent transid verify > failed on 241287168 wanted 33554449 found 17 > > Where do I go from here? These and the other errors point at an issue with the filesystem structure. As I never had to deal with that, I can only give generic advice: 1) Use latest stable btrfs-progs. 2) Umount the filesystem and run btrfs check (maybe with -p) When it finds some errors, proceed with the following steps: Without --repair or some of the other options that modify things it is read only. 3) If you can still access all the files, first thing to do is: rsync or otherwise backup them all to a different location, before attempting anything to repair the issue. 4) If you can´t access some files, you may try to use btrfs restore for restoring them. 5) Then, if you made sure you have an up-to-date backup run btrfs --repair Also watch out for other guidance you may receive her. My approach is based on what I would do. I never had the need to repair a BTRFS filesystem so far. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] BTRFS: Adds the files and options needed for Hybrid Storage
Hello, Am Freitag, 1. Januar 2016, 22:08:32 CET schrieb Sanidhya Solanki: > This patch adds the file required for Hybrid Storage. It contains > the memory, time and size limits for the cache and the statistics that > will be provided while the cache is operating. > It also adds the Makefile changes needed to add the Hybrid Storage. Is this about what I think it is – using flash as cache for a BTRFS filesystem on rotational disk? I ask cause the last time I saw patched regarding they consisted of patches to add hot data tracking to VFS and BTRFS to support setting up an SSD to use for hot data. Or is this something different? Happy New Year and thanks, Martin > Signed-off-by: Sanidhya Solanki> --- > fs/btrfs/Makefile | 2 +- > fs/btrfs/cache.c | 58 > +++ 2 files changed, 59 > insertions(+), 1 deletion(-) > create mode 100644 fs/btrfs/cache.c > > diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile > index 6d1d0b9..dc56ae4 100644 > --- a/fs/btrfs/Makefile > +++ b/fs/btrfs/Makefile > @@ -9,7 +9,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o > root-tree.o dir-item.o \ export.o tree-log.o free-space-cache.o zlib.o > lzo.o \ > compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \ > reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \ > -uuid-tree.o props.o hash.o > +uuid-tree.o props.o hash.o cache.o > > btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o > btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o > diff --git a/fs/btrfs/cache.c b/fs/btrfs/cache.c > new file mode 100644 > index 000..0ece7a1 > --- /dev/null > +++ b/fs/btrfs/cache.c > @@ -0,0 +1,58 @@ > +/* > + * (c) Sanidhya Solanki, 2016 > + * > + * Licensed under the FSF's GNU Public License v2 or later. > + */ > +#include > + > +/* Cache size configuration )in MiB).*/ > +#define MAX_CACHE_SIZE = 1 > +#define MIN_CACHE_SIZE = 10 > + > +/* Time (in seconds)before retrying to increase the cache size.*/ > +#define CACHE_RETRY = 10 > + > +/* Space required to be free (in MiB) before increasing the size of the > + * cache. If cache size is less than cache_grow_limit, a block will be > freed + * from the cache to allow the cache to continue growning. > + */ > +#define CACHE_GROW_LIMIT = 100 > + > +/* Size required to be free (in MiB) after we shrink the cache, so that it > + * does not grow in size immediately. > + */ > +#define CACHE_SHRINK_FREE_SPACE_LIMIT = 100 > + > +/* Age (in seconds) of oldest and newest block in the cache.*/ > +#define MAX_AGE_LIMIT = 300 /* Five Minute Rule recommendation, > + * optimum size depends on size of data > + * blocks. > + */ > +#define MIN_AGE_LIMIT = 15 /* In case of cache stampede.*/ > + > +/* Memory constraints (in percentage) before we stop caching.*/ > +#define MIN_MEM_FREE = 10 > + > +/* Cache statistics. */ > +struct cache_stats { > + u64 cache_size; > + u64 maximum_cache_size_attained; > + int cache_hit_rate; > + int cache_miss_rate; > + u64 cache_evicted; > + u64 duplicate_read; > + u64 duplicate_write; > + int stats_update_interval; > +}; > + > +#define cache_size CACHE_SIZE /* Current cache size.*/ > +#define max_cache_size MAX_SIZE /* Max cache limit. */ > +#define min_cache_size MIN_SIZE /* Min cache limit.*/ > +#define cache_time MAX_TIME /* Maximum time to keep data in cache.*/ > +#define evicted_csum EVICTED_CSUM/* Checksum of the evited data > + * (to avoid repeatedly caching > + * data that was just evicted. > + */ > +#define read_csumREAD_CSUM /* Checksum of the read data.*/ > +#define write_csum WRITE_CSUM /* Checksum of the written data.*/ > +#define evict_interval EVICT_INTERVAL /* Time to keep data before > eviction.*/ -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs scrub failing
Am Samstag, 2. Januar 2016, 11:35:51 CET schrieb Martin Steigerwald: > Am Freitag, 1. Januar 2016, 20:04:43 CET schrieb John Center: > > Hi Duncan, > > > > On Fri, Jan 1, 2016 at 12:05 PM, Duncan <1i5t5.dun...@cox.net> wrote: > > > John Center posted on Fri, 01 Jan 2016 11:41:20 -0500 as excerpted: > > >> If this doesn't resolve the problem, what would you recommend my next > > >> steps should be? I've been hesitant to run too many of the > > >> btrfs-tools, > > >> mainly because I don't want to accidentally screw things up & I don't > > >> always know how to interpret the results. (I ran btrfs-debug-tree, > > >> hoping something obvious would show up. Big mistake. ) > > > > > > LOLed at that debug-tree remark. Been there (with other tools) myself. > > > > > > Well, I'm hoping someone who had the problem can confirm whether it's > > > fixed in current kernels (scrub is one of those userspace commands > > > that's > > > mostly just a front-end to the kernel code which does the real work, so > > > kernel version is the important thing for scrub). I'm guessing so, and > > > that you'll find the problem gone in 4.3. > > > > > > We'll cross the not-gone bridge if we get to it, but again, if the other > > > people who had the similar problem can confirm whether it disappeared > > > for > > > them with the new kernel, it would help a lot, as there were enough such > > > reports that if it's the same problem and still there for everyone > > > (which > > > I doubt as I expect there'd still be way more posts about it if so, but > > > confirmation's always good), nothing to do but wait for a fix, while if > > > not, and you still have your problem, then it's a different issue and > > > the > > > devs will need to work with you on a fix specific to your problem. > > > > Ok, I'm at the next bridge. :-( I upgraded the kernel to 4.4rc7 from > > the Ubuntu Mainline archive & I just ran the scrub: > > > > john@mariposa:~$ sudo /sbin/btrfs scrub start -BdR /dev/md125p2 > > ERROR: scrubbing /dev/md125p2 failed for device id 1: ret=-1, errno=5 > > (Input/output error) > > scrub device /dev/md125p2 (id 1) canceled > > scrub started at Fri Jan 1 19:38:21 2016 and was aborted after 00:02:34 > > data_extents_scrubbed: 111031 > > tree_extents_scrubbed: 104061 > > data_bytes_scrubbed: 2549907456 > > tree_bytes_scrubbed: 1704935424 > > read_errors: 0 > > csum_errors: 0 > > verify_errors: 0 > > no_csum: 1573 > > csum_discards: 0 > > super_errors: 0 > > malloc_errors: 0 > > uncorrectable_errors: 0 > > unverified_errors: 0 > > corrected_errors: 0 > > last_physical: 4729667584 > > > > I checked dmesg & this appeared: > > > > [11428.983355] BTRFS error (device md125p2): parent transid verify > > failed on 241287168 wanted 33554449 found 17 > > [11431.028399] BTRFS error (device md125p2): parent transid verify > > failed on 241287168 wanted 33554449 found 17 > > > > Where do I go from here? > > These and the other errors point at an issue with the filesystem structure. > > As I never had to deal with that, I can only give generic advice: > > 1) Use latest stable btrfs-progs. > > 2) Umount the filesystem and run > > btrfs check (maybe with -p) > > When it finds some errors, proceed with the following steps: > > Without --repair or some of the other options that modify things it is read > only. > > 3) If you can still access all the files, first thing to do is: rsync or > otherwise backup them all to a different location, before attempting > anything to repair the issue. > > 4) If you can´t access some files, you may try to use btrfs restore for > restoring them. > > 5) Then, if you made sure you have an up-to-date backup run > > btrfs --repair Before doing that, review: https://btrfs.wiki.kernel.org/index.php/Btrfsck to learn about other options. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs Check - "type mismatch with chunk"
Am Donnerstag, 24. Dezember 2015, 23:41:06 CET schrieb Duncan: > Zach Fuller posted on Thu, 24 Dec 2015 13:15:22 -0600 as excerpted: > > I am currently running btrfs on a 2TB GPT drive. The drive is working > > fine, still mounts correctly, and I have experienced no data corruption. > > Whenever I run "btrfs check" on the drive, it returns 100,000+ messages > > stating "bad extent [###, ###), type mismatch with chunk". Whenever I > > try to run "btrfs check --repair" it says that it has fixed the errors, > > but whenever I run "btrfs check" again, the errors return. Should I be > > worried about data/filesystem corruption, > > or are these errors meaningless? > > > > I have my data backed up on 2 different drives, so I can afford to lose > > the entire btrfs drive temporarily. > > > > Here is some info about my system: > > > > $ uname -[r] > > 4.2.5-1-ARCH > > > > > > $ btrfs --version > > btrfs-progs v4.3.1 > > While Chris's reply mentioning a patch is correct, that's not the whole > story and I suspect you have a problem, as the patch is in the userspace > 4.3.1 you're running. > > How long have you had the filesystem? Was it likely created with the > mkfs.btrfs from btrfs-progs v4.1.1 (July, 2015) as I suspect? If so, you > have a problem, as that mkfs.btrfs was buggy and created invalid > filesystems. > > As you have two separate backups and you're not experiencing corruption > or the like so far, you should be fine, but if the filesystem was created > with that buggy mkfs.btrfs, you need to wipe and recreate it as soon as > possible, because it's unstable in its current state and could fail, with > massive corruption, at any point. Unfortunately, the bug created > filesystems so broken that (last I knew anyway, and your experience > agrees) there's no way btrfs check --repair can fix them. The only way > to fix it is to blow away the filesystem and recreate it with a > mkfs.btrfs that doesn't have the bug that 4.1.1 did. Your 4.3.1 should > be fine. > > (The patch Chris mentioned was to btrfs check, as the first set of > patches to it to allow it to detect the problem triggered all sorts of > false-positives and pretty much everybody was flagged as having the > problem. I believe that was patched in the 4.2 series, however, and > you're running 4.3.1, so you should have that patch and the reports > shouldn't be false positives. Tho if you didn't create the filesystem > with the buggy mkfs.btrfs from v4.1.1, there's likely some other problem > to root out, but I'm guessing you did, and thus have the bad filesystem > the patched btrfs check is designed to report, and that report is indeed > valid.) I have this issue as well on one of the filesystems I just checked in order to describe to John how to have a go at fixing his filesystem. A tone of these with different numbers: bad extent [347045888, 347062272), type mismatch with chunk It doesn´t go away with running btrfs check --repair on it. Last scrub was from yesterday and returned with 0 errors. I will rerun a scrub again after the repair attempt. And if its good I will play it safe and redo the filesystem from scratch. It may have been I used a mkfs.btrfs from 4.1.1 for creating it. Would be good if it stored the version of the tool that created the fs into the fs itself to be able to know for sure. It is the youngest BTRFS filesystem on my laptop SSDs. I created it about April 2014 tough. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs scrub failing
Am Freitag, 1. Januar 2016, 13:20:49 CET schrieb John Center: > > On Jan 1, 2016, at 12:41 PM, Martin Steigerwald <mar...@lichtvoll.de> > > wrote: > > Am Freitag, 1. Januar 2016, 11:41:20 CET schrieb John Center: […] > >>> On Jan 1, 2016, at 12:55 AM, Duncan <1i5t5.dun...@cox.net> wrote: > >>> > >>> A couple months ago, which would have made it around the 4.2 kernel > >>> you're running (with 4.3 being current and 4.4 nearly out), there were a > >>> number of similar scrub aborted reports on the list. > >> > >> I must have missed that, I'll check the list again to try & understand > >> the > >> issue better. > > > > I had repeatedly failing scrubs as mentioned in another thread here, until > > I used 4.4 kernel. With 4.3 kernel scrub also didn´t work. I didn´t use > > the debug options you used above and I am not sure whether I had this > > scrub issue with 4.2 already, so I am not sure it has been the same > > issue. But you may need to run 4.4 kernel in order to get scrub working > > again. > > > > See my thread "[4.3-rc4] scrubbing aborts before finishing" for details. > > I was afraid of this. I just read your thread. I generally try to stay away > from kernels so new, but I may have to try it. Was there any reason you > didn't go to 4.1 instead? (I run win8.1 in VirtualBox 5.0.12, when I need > to run somethings under Windows. I'd have to wait until 4.4 is released & > supported to do that.) So far 4.4-rc6 is pretty stable for me. And I think its almost before release as rc7 is out already. Reason for not going with 4.1? Ey, that would be downgrading, wouldn´t it? But sure it is also an option. Virtualbox 5.0.12-dfsg-2 as packaged by Debian runs fine here with 4.4-rc6. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
still kworker at 100% cpu in all of device size allocated with chunks situations with write load
First: Happy New Year to you! Second: Take your time. I know its holidays for many. For me it means I easily have time to follow-up on this. Am Mittwoch, 16. Dezember 2015, 09:20:45 CET schrieb Qu Wenruo: > Chris Mason wrote on 2015/12/15 16:59 -0500: > > On Mon, Dec 14, 2015 at 10:08:16AM +0800, Qu Wenruo wrote: > >> Martin Steigerwald wrote on 2015/12/13 23:35 +0100: > >>> Hi! > >>> > >>> For me it is still not production ready. > >> > >> Yes, this is the *FACT* and not everyone has a good reason to deny it. > >> > >>> Again I ran into: > >>> > >>> btrfs kworker thread uses up 100% of a Sandybridge core for minutes on > >>> random write into big file > >>> https://bugzilla.kernel.org/show_bug.cgi?id=90401 > >> > >> Not sure about guideline for other fs, but it will attract more dev's > >> attention if it can be posted to maillist. > >> > >>> No matter whether SLES 12 uses it as default for root, no matter whether > >>> Fujitsu and Facebook use it: I will not let this onto any customer > >>> machine > >>> without lots and lots of underprovisioning and rigorous free space > >>> monitoring. Actually I will renew my recommendations in my trainings to > >>> be careful with BTRFS. > >>> > >>> From my experience the monitoring would check for: > >>> merkaba:~> btrfs fi show /home > >>> Label: 'home' uuid: […] > >>> > >>> Total devices 2 FS bytes used 156.31GiB > >>> devid1 size 170.00GiB used 164.13GiB path > >>> /dev/mapper/msata-home > >>> devid2 size 170.00GiB used 164.13GiB path > >>> /dev/mapper/sata-home > >>> > >>> If "used" is same as "size" then make big fat alarm. It is not > >>> sufficient for it to happen. It can run for quite some time just fine > >>> without any issues, but I never have seen a kworker thread using 100% > >>> of one core for extended period of time blocking everything else on the > >>> fs without this condition being met.>> > >> And specially advice on the device size from myself: > >> Don't use devices over 100G but less than 500G. > >> Over 100G will leads btrfs to use big chunks, where data chunks can be at > >> most 10G and metadata to be 1G. > >> > >> I have seen a lot of users with about 100~200G device, and hit unbalanced > >> chunk allocation (10G data chunk easily takes the last available space > >> and > >> makes later metadata no where to store) > > > > Maybe we should tune things so the size of the chunk is based on the > > space remaining instead of the total space? > > Submitted such patch before. > David pointed out that such behavior will cause a lot of small > fragmented chunks at last several GB. > Which may make balance behavior not as predictable as before. > > > At least, we can just change the current 10% chunk size limit to 5% to > make such problem less easier to trigger. > It's a simple and easy solution. > > Another cause of the problem is, we understated the chunk size change > for fs at the borderline of big chunk. > > For 99G, its chunk size limit is 1G, and it needs 99 data chunks to > fully cover the fs. > But for 100G, it only needs 10 chunks to covert the fs. > And it need to be 990G to match the number again. > > The sudden drop of chunk number is the root cause. > > So we'd better reconsider both the big chunk size limit and chunk size > limit to find a balanaced solution for it. Did you come to any conclusion here? Is there anything I can change with my home BTRFS filesystem to try to find out what works? Challenge here is that it doesn´t happen under defined circumstances. So far I only know the required condition, but not the sufficient condition for it to happen. Another user run into the issue and reported his findings in the bug report: https://bugzilla.kernel.org/show_bug.cgi?id=90401#c14 Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs scrub failing
Am Freitag, 1. Januar 2016, 11:41:20 CET schrieb John Center: Happy New Year! > > On Jan 1, 2016, at 12:55 AM, Duncan <1i5t5.dun...@cox.net> wrote: > > > > > > John Center posted on Thu, 31 Dec 2015 11:20:28 -0500 as excerpted: > > > > > >> I run a weekly scrub, using Marc Merlin's btrfs-scrub script. > >> Usually, it completes without a problem, but this week it failed. I ran > >> > >> the scrub manually & it stops shortly: > >> > >> > >> john@mariposa:~$ sudo /sbin/btrfs scrub start -BdR /dev/md124p2 > >> ERROR: scrubbing /dev/md124p2 failed for device id 1: > >> ret=-1, errno=5 (Input/output error) > >> scrub device /dev/md124p2 (id 1) canceled > >> scrub started at Thu Dec 31 00:26:34 2015 > >> and was aborted after 00:01:29 [...] > > > > > > > >> My Ubuntu 14.04 workstation is using the 4.2 kernel (Wily). > >> I'm using btrfs-tools v4.3.1. [...] > > > > > > > > A couple months ago, which would have made it around the 4.2 kernel > > you're running (with 4.3 being current and 4.4 nearly out), there were a > > number of similar scrub aborted reports on the list. > > > > > > I must have missed that, I'll check the list again to try & understand the > issue better. I had repeatedly failing scrubs as mentioned in another thread here, until I used 4.4 kernel. With 4.3 kernel scrub also didn´t work. I didn´t use the debug options you used above and I am not sure whether I had this scrub issue with 4.2 already, so I am not sure it has been the same issue. But you may need to run 4.4 kernel in order to get scrub working again. See my thread "[4.3-rc4] scrubbing aborts before finishing" for details. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs und lvm-cache?
Am Mittwoch, 23. Dezember 2015, 11:45:28 CET schrieb Neuer User: > Hello Hi. > I want to setup a small homeserver, based on a HP Microserver Gen8 (4GB > RAM, 2x3TB HDD + 1x120GB SSD) and Proxmox as distro. > > The server will be used to host a (small) number of virtual machines, > most of them being LXC containers, few being KVM machines. One of the > LXC containers will host a fileserver with app 1 TB of data and another > one a backup system for the desktops / laptops in my household, thus > probably holding quite a lot of files. The lxc containers will use the > filesystem of the proxmox host, the KVM machines probably raw disk files > (or qcow2). > > I would like to combine high data integrity with some speed, so I > thought of the following layout: > > - both hdd and ssd in one LVM VG > - one LV on each hdd, containing a btrfs filesystem > - both btrfs LV configured as RAID1 > - the single SDD used as a LVM cache device for both HDD LVs to speed up > random access, where possible > > Now, I wonder if that is a good architecture to go for. Any input on > that? Is btrfs the right way to go for, or should I better go for ZFS > (and purchase some more gigs of RAM)? > > Will there be any problems arising from the lvmcache? btrfs only sees > the HDDs, LVM does the SDD handling. As far as I understand this way you basically loose the RAID 1 semantics of BTRFS. While the data is redundant on the HDDs, it is not redundant on the SSD. It may work for a pure read cache, but for write-through you definately loose any data integrity protection a RAID 1 gives you. Of course, you can use two SSDs and have them work as RAID 1 as well. There is a patch set for in-BTRFS SSD-caching. It consists of a patch set to add hot data tracking to VFS and a patch set for adding support in BTRFS. But I didn´t see anything of these in quite some time. Happy christmas, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [4.3-rc4] scrubbing aborts before finishing (SOLVED)
Am Mittwoch, 16. Dezember 2015, 00:18:53 CET schrieb Martin Steigerwald: > Am Montag, 14. Dezember 2015, 08:59:59 CET schrieb Martin Steigerwald: > > Am Mittwoch, 25. November 2015, 16:35:39 CET schrieben Sie: > > > Am Samstag, 31. Oktober 2015, 12:10:37 CET schrieb Martin Steigerwald: > > > > Am Donnerstag, 22. Oktober 2015, 10:41:15 CET schrieb Martin > Steigerwald: > > > > > I get this: > > > > > > > > > > merkaba:~> btrfs scrub status -d / > > > > > scrub status for […] > > > > > scrub device /dev/mapper/sata-debian (id 1) history > > > > > > > > > > scrub started at Thu Oct 22 10:05:49 2015 and was aborted > > > > > after > > > > > 00:00:00 > > > > > total bytes scrubbed: 0.00B with 0 errors > > > > > > > > > > scrub device /dev/dm-2 (id 2) history > > > > > > > > > > scrub started at Thu Oct 22 10:05:49 2015 and was aborted > > > > > after > > > > > 00:01:30 > > > > > total bytes scrubbed: 23.81GiB with 0 errors > > > > > > > > > > For / scrub aborts for sata SSD immediately. > > > > > > > > > > For /home scrub aborts for both SSDs at some time. […] > I now have 4.4-rc5 running, the boot crash I had appears to be fixed. Oh, > and I see that scrubbing / at leasted worked now: > > merkaba:~> btrfs scrub status -d / > scrub status for […] > scrub device /dev/dm-5 (id 1) history > scrub started at Wed Dec 16 00:13:20 2015 and finished after > 00:01:42 total bytes scrubbed: 23.94GiB with 0 errors > scrub device /dev/mapper/msata-debian (id 2) history > scrub started at Wed Dec 16 00:13:20 2015 and finished after > 00:01:34 total bytes scrubbed: 23.94GiB with 0 errors > > I will check with other BTRFS filesystems tomorrow and report back whether > scrubbing is stable for meagain. This appears to be fixed with 4.4-rc5. Thank you!!! Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Still not production ready
Am Dienstag, 15. Dezember 2015, 16:59:58 CET schrieb Chris Mason: > On Mon, Dec 14, 2015 at 10:08:16AM +0800, Qu Wenruo wrote: > > Martin Steigerwald wrote on 2015/12/13 23:35 +0100: > > >Hi! > > > > > >For me it is still not production ready. > > > > Yes, this is the *FACT* and not everyone has a good reason to deny it. > > > > >Again I ran into: > > > > > >btrfs kworker thread uses up 100% of a Sandybridge core for minutes on > > >random write into big file > > >https://bugzilla.kernel.org/show_bug.cgi?id=90401 > > > > Not sure about guideline for other fs, but it will attract more dev's > > attention if it can be posted to maillist. > > > > >No matter whether SLES 12 uses it as default for root, no matter whether > > >Fujitsu and Facebook use it: I will not let this onto any customer > > >machine > > >without lots and lots of underprovisioning and rigorous free space > > >monitoring. Actually I will renew my recommendations in my trainings to > > >be careful with BTRFS. > > > > > > From my experience the monitoring would check for: > > >merkaba:~> btrfs fi show /home > > >Label: 'home' uuid: […] > > > > > > Total devices 2 FS bytes used 156.31GiB > > > devid1 size 170.00GiB used 164.13GiB path > > > /dev/mapper/msata-home > > > devid2 size 170.00GiB used 164.13GiB path > > > /dev/mapper/sata-home > > > > > >If "used" is same as "size" then make big fat alarm. It is not sufficient > > >for it to happen. It can run for quite some time just fine without any > > >issues, but I never have seen a kworker thread using 100% of one core > > >for extended period of time blocking everything else on the fs without > > >this condition being met.> > > And specially advice on the device size from myself: > > Don't use devices over 100G but less than 500G. > > Over 100G will leads btrfs to use big chunks, where data chunks can be at > > most 10G and metadata to be 1G. > > > > I have seen a lot of users with about 100~200G device, and hit unbalanced > > chunk allocation (10G data chunk easily takes the last available space and > > makes later metadata no where to store) > > Maybe we should tune things so the size of the chunk is based on the > space remaining instead of the total space? Still on my filesystem where was over 1 GiB free on metadata chunks, so… … my theory still is: BTRFS has trouble finding free space in chunks at some time. > > And unfortunately, your fs is already in the dangerous zone. > > (And you are using RAID1, which means it's the same as one 170G btrfs with > > SINGLE data/meta) > > > > >In addition to that last time I tried it aborts scrub any of my BTRFS > > >filesstems. Reported in another thread here that got completely ignored > > >so > > >far. I think I could go back to 4.2 kernel to make this work. > > We'll pick this thread up again, the ones that get fixed the fastest are > the ones that we can easily reproduce. The rest need a lot of think > time. I understand. Maybe I just wanted to see at least some sort of an reaction. I now have 4.4-rc5 running, the boot crash I had appears to be fixed. Oh, and I see that scrubbing / at leasted worked now: merkaba:~> btrfs scrub status -d / scrub status for […] scrub device /dev/dm-5 (id 1) history scrub started at Wed Dec 16 00:13:20 2015 and finished after 00:01:42 total bytes scrubbed: 23.94GiB with 0 errors scrub device /dev/mapper/msata-debian (id 2) history scrub started at Wed Dec 16 00:13:20 2015 and finished after 00:01:34 total bytes scrubbed: 23.94GiB with 0 errors Okay, I test the other ones tomorrow, so maybe this one is fixed meanwhile. Yay! Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [4.3-rc4] scrubbing aborts before finishing (probably solved)
Am Montag, 14. Dezember 2015, 08:59:59 CET schrieb Martin Steigerwald: > Am Mittwoch, 25. November 2015, 16:35:39 CET schrieben Sie: > > Am Samstag, 31. Oktober 2015, 12:10:37 CET schrieb Martin Steigerwald: > > > Am Donnerstag, 22. Oktober 2015, 10:41:15 CET schrieb Martin Steigerwald: > > > > I get this: > > > > > > > > merkaba:~> btrfs scrub status -d / > > > > scrub status for […] > > > > scrub device /dev/mapper/sata-debian (id 1) history > > > > > > > > scrub started at Thu Oct 22 10:05:49 2015 and was aborted > > > > after > > > > 00:00:00 > > > > total bytes scrubbed: 0.00B with 0 errors > > > > > > > > scrub device /dev/dm-2 (id 2) history > > > > > > > > scrub started at Thu Oct 22 10:05:49 2015 and was aborted > > > > after > > > > 00:01:30 > > > > total bytes scrubbed: 23.81GiB with 0 errors > > > > > > > > For / scrub aborts for sata SSD immediately. > > > > > > > > For /home scrub aborts for both SSDs at some time. > > > > > > > > merkaba:~> btrfs scrub status -d /home > > > > scrub status for […] > > > > scrub device /dev/mapper/msata-home (id 1) history > > > > > > > > scrub started at Thu Oct 22 10:09:37 2015 and was aborted > > > > after > > > > 00:01:31 > > > > total bytes scrubbed: 22.03GiB with 0 errors > > > > > > > > scrub device /dev/dm-3 (id 2) history > > > > > > > > scrub started at Thu Oct 22 10:09:37 2015 and was aborted > > > > after > > > > 00:03:34 > > > > total bytes scrubbed: 53.30GiB with 0 errors > > > > > > > > Also single volume BTRFS is affected: > > > > > > > > merkaba:~> btrfs scrub status /daten > > > > scrub status for […] > > > > > > > > scrub started at Thu Oct 22 10:36:38 2015 and was aborted > > > > after > > > > 00:00:00 > > > > total bytes scrubbed: 0.00B with 0 errors > > > > > > > > No errors in dmesg, btrfs device stat or smartctl -a. > > > > > > > > Any known issue? > > > > > > I am still seeing this in 4.3-rc7. It happens so that on one SSD BTRFS > > > doesn´t even start scrubbing. But in the end it aborts it scrubbing > > > anyway. > > > > > > I do not see any other issue so far. But I would really like to be able > > > to > > > scrub my BTRFS filesystems completely again. Any hints? Any further > > > information needed? > > > > > > merkaba:~> btrfs scrub status -d / > > > scrub status for […] > > > scrub device /dev/dm-5 (id 1) history > > > > > > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00 > > > total bytes scrubbed: 0.00B with 0 errors > > > > > > scrub device /dev/mapper/msata-debian (id 2) status > > > > > > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:20 > > > total bytes scrubbed: 5.27GiB with 0 errors > > > > > > merkaba:~> btrfs scrub status -d / > > > scrub status for […] > > > scrub device /dev/dm-5 (id 1) history > > > > > > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00 > > > total bytes scrubbed: 0.00B with 0 errors > > > > > > scrub device /dev/mapper/msata-debian (id 2) status > > > > > > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:25 > > > total bytes scrubbed: 6.59GiB with 0 errors > > > > > > merkaba:~> btrfs scrub status -d / > > > scrub status for […] > > > scrub device /dev/dm-5 (id 1) history > > > > > > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00 > > > total bytes scrubbed: 0.00B with 0 errors > > > > > > scrub device /dev/mapper/msata-debian (id 2) status > > > > > > scrub started at Sat Oct 31 11:58:45 2015, running for 00:01:25 > > > total bytes scrubbed: 21.97GiB with 0 errors > > > > > > merkaba:~> btrfs scrub status -d / > > > scrub status for […] > > > scrub device /dev/dm-5 (id 1
safety of journal based fs (was: Re: still kworker at 100% cpu…)
Hi! Using a different subject for the journal fs related things which are off topic, but still interesting. Might make sense to move to fsdevel-ml or ext4/ XFS mailing lists? Otherwise, I suggest we focus on BTRFS here. Still wanted to reply. Am Montag, 14. Dezember 2015, 16:48:58 CET schrieb Qu Wenruo: > Martin Steigerwald wrote on 2015/12/14 09:18 +0100: > > Am Montag, 14. Dezember 2015, 10:08:16 CET schrieb Qu Wenruo: > >> Martin Steigerwald wrote on 2015/12/13 23:35 +0100: […] > >>> I am seriously consider to switch to XFS for my production laptop again. > >>> Cause I never saw any of these free space issues with any of the XFS or > >>> Ext4 filesystems I used in the last 10 years. > >> > >> Yes, xfs and ext4 is very stable for normal use case. > >> > >> But at least, I won't recommend xfs yet, and considering the nature or > >> journal based fs, I'll recommend backup power supply in crash recovery > >> for both of them. > >> > >> Xfs already messed up several test environment of mine, and an > >> unfortunate double power loss has destroyed my whole /home ext4 > >> partition years ago. > > > > Wow. I have never seen this. Actual I teach journal filesystems being > > quite > > safe on power losses as long as cache flushes (former barrier) > > functionality is active and working. With one caveat: It relies on one > > sector being either completely written or not. I never seen any > > scientific proof for that on usual storage devices. > > The journal is used to be safe against power loss. > That's OK. > > But the problem is, when recovering journal, there is no journal of > journal, to keep journal recovering safe from power loss. But the journal should be safe due to a journal commit being one sector? Of course for the last changes without a journal commit its: The stuff is gone. > And that's the advantage of COW file system, no need of journal completely. > Although Btrfs is less safe than stable journal based fs yet. > > >> [xfs story] > >> After several crash, xfs makes several corrupted file just to 0 size. > >> Including my kernel .git directory. Then I won't trust it any longer. > >> No to mention that grub2 support for xfs v5 is not here yet. > > > > That is no filesystem metadata structure crash. It is a known issue with > > delayed allocation. Same with Ext4. I teach this as well in my performance > > analysis & tuning course. > > Unfortunately, it's not about delayed allocation, as it's not a new > file, it's file already here with contents in previous transaction. > The workload should only rewrite the files.(Not sure though) For what I know the overwriting after truncating case is also related to the delayed allocation, deferred write thing: File has been truncated to zero bytes in journal, while no data has been written. But well for Ext4 / XFS it doesn´t need to reallocate in this case. > And for ext4 case, I'll see corrupted files, but not truncated to 0 size. > So IMHO it may be related to xfs recovery behavior. > But not sure as I never read xfs codes. Journals online provide *metadata* consistency. Unless you use Ext4 with data=journal, which is supposed to be much slower, but in some workloads its actually faster. Even Andrew Morton had not explaination for that, however I do have an idea about it. Also data=journal is interesting, if you put journal for harddisk based Ext4 onto an SSD or an SSD RAID 1 or so. > > Also BTRFS in principle has this issue I believe. As far as I am aware it > > has a fix for the rename case, not using delayed allocation in the case. > > Due to its COW nature it may not be affected at all however, I don´t > > know. > Anyway for rewrite case, none of these fs should truncate fs size to 0. > However, it seems xfs doesn't follow the way though. > Although I'm not 100% sure, as after that disaster I reinstall my test > box using ext4. > > (Maybe next time I should try btrfs, at least when it fails, I have my > chance to submit new patches to kernel or btrfsck) I do think its the applications doing that on overwriting a file. Rewriting a config file for example. Its either write new file, rename to old, or truncate to zero bytes and rewrite. Of course, its different for databases or other files written into without rewriting them. But there you need data=journal on Ext4. XFS doesn´t guarentee file consistency at all in that case, unless the application serializes changes with fsync() properly by using an in application journal for the data to write. > >> [ext4 story] > >> For ext4, when recovering my /home partition after a power loss, a new > >> power loss hap
still kworker at 100% cpu in all of device size allocated with chunks situations with write load (was: Re: Still not production ready)
Am Sonntag, 13. Dezember 2015, 15:19:14 CET schrieb Marc MERLIN: > On Sun, Dec 13, 2015 at 11:35:08PM +0100, Martin Steigerwald wrote: > > Hi! > > > > For me it is still not production ready. Again I ran into: > > > > btrfs kworker thread uses up 100% of a Sandybridge core for minutes on > > random write into big file > > https://bugzilla.kernel.org/show_bug.cgi?id=90401 > > Sorry you're having issues. I haven't seen this before myself. > I couldn't find the kernel version you're using in your Email or the bug > you filed (quick scan). > > That's kind of important :) I definately know this much. :) It happened with 4.3 yesterday. The other kernel version was 3.18. Information should be in the bug report. Yeah, 3.18 as mentioned in the Kernel Version field. And 4.3 as I mentioned in the last comment of the bug report. The scrubbing issue is I think since 4.3, I also seen it with 4.4-rc2/rc4 I believe, but I didn´t go back then to check more toroughly. I didn´t report the scrubbing issue with bugzilla yet as I got no feedback on my mailing list posts so far. I will bump the thread in a moment and suggest we discuss free space issue here and scrubbing issue in the other thread. I went back to 4.3 cause 4.4-rc2/4 does not even boot on my machine most of the times. I also reported this (BTRFS unrelated one). Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [4.3-rc4] scrubbing aborts before finishing
Am Mittwoch, 25. November 2015, 16:35:39 CET schrieben Sie: > Am Samstag, 31. Oktober 2015, 12:10:37 CET schrieb Martin Steigerwald: > > Am Donnerstag, 22. Oktober 2015, 10:41:15 CET schrieb Martin Steigerwald: > > > I get this: > > > > > > merkaba:~> btrfs scrub status -d / > > > scrub status for […] > > > scrub device /dev/mapper/sata-debian (id 1) history > > > > > > scrub started at Thu Oct 22 10:05:49 2015 and was aborted after > > > 00:00:00 > > > total bytes scrubbed: 0.00B with 0 errors > > > > > > scrub device /dev/dm-2 (id 2) history > > > > > > scrub started at Thu Oct 22 10:05:49 2015 and was aborted after > > > 00:01:30 > > > total bytes scrubbed: 23.81GiB with 0 errors > > > > > > For / scrub aborts for sata SSD immediately. > > > > > > For /home scrub aborts for both SSDs at some time. > > > > > > merkaba:~> btrfs scrub status -d /home > > > scrub status for […] > > > scrub device /dev/mapper/msata-home (id 1) history > > > > > > scrub started at Thu Oct 22 10:09:37 2015 and was aborted after > > > 00:01:31 > > > total bytes scrubbed: 22.03GiB with 0 errors > > > > > > scrub device /dev/dm-3 (id 2) history > > > > > > scrub started at Thu Oct 22 10:09:37 2015 and was aborted after > > > 00:03:34 > > > total bytes scrubbed: 53.30GiB with 0 errors > > > > > > Also single volume BTRFS is affected: > > > > > > merkaba:~> btrfs scrub status /daten > > > scrub status for […] > > > > > > scrub started at Thu Oct 22 10:36:38 2015 and was aborted after > > > 00:00:00 > > > total bytes scrubbed: 0.00B with 0 errors > > > > > > No errors in dmesg, btrfs device stat or smartctl -a. > > > > > > Any known issue? > > > > I am still seeing this in 4.3-rc7. It happens so that on one SSD BTRFS > > doesn´t even start scrubbing. But in the end it aborts it scrubbing > > anyway. > > > > I do not see any other issue so far. But I would really like to be able to > > scrub my BTRFS filesystems completely again. Any hints? Any further > > information needed? > > > > merkaba:~> btrfs scrub status -d / > > scrub status for […] > > scrub device /dev/dm-5 (id 1) history > > > > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00 > > total bytes scrubbed: 0.00B with 0 errors > > > > scrub device /dev/mapper/msata-debian (id 2) status > > > > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:20 > > total bytes scrubbed: 5.27GiB with 0 errors > > > > merkaba:~> btrfs scrub status -d / > > scrub status for […] > > scrub device /dev/dm-5 (id 1) history > > > > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00 > > total bytes scrubbed: 0.00B with 0 errors > > > > scrub device /dev/mapper/msata-debian (id 2) status > > > > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:25 > > total bytes scrubbed: 6.59GiB with 0 errors > > > > merkaba:~> btrfs scrub status -d / > > scrub status for […] > > scrub device /dev/dm-5 (id 1) history > > > > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00 > > total bytes scrubbed: 0.00B with 0 errors > > > > scrub device /dev/mapper/msata-debian (id 2) status > > > > scrub started at Sat Oct 31 11:58:45 2015, running for 00:01:25 > > total bytes scrubbed: 21.97GiB with 0 errors > > > > merkaba:~> btrfs scrub status -d / > > scrub status for […] > > scrub device /dev/dm-5 (id 1) history > > > > scrub started at Sat Oct 31 11:58:45 2015 and was aborted after > > > > 00:00:00 total bytes scrubbed: 0.00B with 0 errors > > scrub device /dev/mapper/msata-debian (id 2) history > > > > scrub started at Sat Oct 31 11:58:45 2015 and was aborted after > > > > 00:01:32 total bytes scrubbed: 23.63GiB with 0 errors > > > > > > For the sake of it I am going to btrfs check one of the filesystem where > > BTRFS aborts scrubbing (which is all of the laptop filesystems, not only > > the RAID 1 one). > > > > I will use the /daten filesystem as I can unmount it dur
still kworker at 100% cpu in all of device size allocated with chunks situations with write load (was: Re: Still not production ready)
Am Montag, 14. Dezember 2015, 10:08:16 CET schrieb Qu Wenruo: > Martin Steigerwald wrote on 2015/12/13 23:35 +0100: > > Hi! > > > > For me it is still not production ready. > > Yes, this is the *FACT* and not everyone has a good reason to deny it. > > > Again I ran into: > > > > btrfs kworker thread uses up 100% of a Sandybridge core for minutes on > > random write into big file > > https://bugzilla.kernel.org/show_bug.cgi?id=90401 > > Not sure about guideline for other fs, but it will attract more dev's > attention if it can be posted to maillist. I did, as mentioned in the bug report: BTRFS free space handling still needs more work: Hangs again Martin Steigerwald | 26 Dec 14:37 2014 http://permalink.gmane.org/gmane.comp.file-systems.btrfs/41790 > > No matter whether SLES 12 uses it as default for root, no matter whether > > Fujitsu and Facebook use it: I will not let this onto any customer machine > > without lots and lots of underprovisioning and rigorous free space > > monitoring. Actually I will renew my recommendations in my trainings to > > be careful with BTRFS. > > > > From my experience the monitoring would check for: > > merkaba:~> btrfs fi show /home > > Label: 'home' uuid: […] > > > > Total devices 2 FS bytes used 156.31GiB > > devid1 size 170.00GiB used 164.13GiB path > > /dev/mapper/msata-home > > devid2 size 170.00GiB used 164.13GiB path > > /dev/mapper/sata-home > > > > If "used" is same as "size" then make big fat alarm. It is not sufficient > > for it to happen. It can run for quite some time just fine without any > > issues, but I never have seen a kworker thread using 100% of one core for > > extended period of time blocking everything else on the fs without this > > condition being met. > And specially advice on the device size from myself: > Don't use devices over 100G but less than 500G. > Over 100G will leads btrfs to use big chunks, where data chunks can be > at most 10G and metadata to be 1G. > > I have seen a lot of users with about 100~200G device, and hit > unbalanced chunk allocation (10G data chunk easily takes the last > available space and makes later metadata no where to store) Interesting, but in my case there is still quite some free space in already allocated metadata chunks. Anyway, I did had enospc issues on trying to balance the chunks. > And unfortunately, your fs is already in the dangerous zone. > (And you are using RAID1, which means it's the same as one 170G btrfs > with SINGLE data/meta) Well, I know for any FS its not recommended to let it run to full and leave about 10-15% free at least, but while it is not 10-15% anymore, its still a whopping 11-12 GiB of free space. I would accept a somewhat slower operation in this case, but no kworker at 100% for about 10-30 seconds blocking everything else on going on on the filesystem. For whatever reason Plasma seems to access the fs on almost every action I do with it, so not even panels slide out anymore or activity switcher works during that time. > > In addition to that last time I tried it aborts scrub any of my BTRFS > > filesstems. Reported in another thread here that got completely ignored so > > far. I think I could go back to 4.2 kernel to make this work. > > Unfortunately, this happens a lot of times, even you posted it to mail list. > Devs here are always busy locating bugs or adding new features or > enhancing current behavior. > > So *PLEASE* be patient about such slow response. Okay, thanks at least for the acknowledgement of this. I try to be even more patient. > BTW, you may not want to revert to 4.2 until some bug fix is backported > to 4.2. > As qgroup rework in 4.2 has broken delayed ref and caused some scrub > bugs. (My fault) Hm, well scrubbing does not work for me either. But since 4.3/4.4rc2/4. I just bumped the thread: Re: [4.3-rc4] scrubbing aborts before finishing by replying a well by replying a third time to it (not fourth, miscounted:). > > I am not going to bother to go into more detail on any on this, as I get > > the impression that my bug reports and feedback get ignored. So I spare > > myself the time to do this work for now. > > > > > > Only thing I wonder now whether this all could be cause my /home is > > already > > more than one and a half year old. Maybe newly created filesystems are > > created in a way that prevents these issues? But it already has a nice > > global reserve: > > > > merkaba:~> btrfs fi df / > > Data, RAID1: total=27.98GiB, used=24.07GiB > > System, RAID1: total=19.00MiB, used=16.00KiB &g
Re: still kworker at 100% cpu in all of device size allocated with chunks situations with write load
Hi Qu. I reply to the journal fs things in a mail with a different subject. Am Montag, 14. Dezember 2015, 16:48:58 CET schrieb Qu Wenruo: > Martin Steigerwald wrote on 2015/12/14 09:18 +0100: > > Am Montag, 14. Dezember 2015, 10:08:16 CET schrieb Qu Wenruo: > >> Martin Steigerwald wrote on 2015/12/13 23:35 +0100: […] > >> GlobalReserve is just a reserved space *INSIDE* metadata for some corner > >> case. So its profile is always single. > >> > >> The real problem is, how we represent it in btrfs-progs. > >> > >> If it output like below, I think you won't complain about it more: > >> > merkaba:~> btrfs fi df / > >> > Data, RAID1: total=27.98GiB, used=24.07GiB > >> > System, RAID1: total=19.00MiB, used=16.00KiB > >> > Metadata, RAID1: total=2.00GiB, used=728.80MiB > >> > >> Or > >> > >> > merkaba:~> btrfs fi df / > >> > Data, RAID1: total=27.98GiB, used=24.07GiB > >> > System, RAID1: total=19.00MiB, used=16.00KiB > >> > Metadata, RAID1: total=2.00GiB, used=(536.80 + 192.00)MiB > >> > > >> > \ GlobalReserve: total=192.00MiB, used=0.00B > > > > Oh, the global reserve is *inside* the existing metadata chunks? Thats > > interesting. I didn´t know that. > > And I have already submit btrfs-progs patch to change the default output > of 'fi df'. > > Hopes to solve the problem. Nice. Thank you. It clarifies it quite a bit. I always wondered why its single. On which device does it allocate it in a RAID 1? Also can the data stored in there temporarily be recreated in case of loosing a device? In case that not, BTRFS would not guarantee that one device of a RAID 1 can fail at all times. Ciao, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Still not production ready
Hi! For me it is still not production ready. Again I ran into: btrfs kworker thread uses up 100% of a Sandybridge core for minutes on random write into big file https://bugzilla.kernel.org/show_bug.cgi?id=90401 No matter whether SLES 12 uses it as default for root, no matter whether Fujitsu and Facebook use it: I will not let this onto any customer machine without lots and lots of underprovisioning and rigorous free space monitoring. Actually I will renew my recommendations in my trainings to be careful with BTRFS. >From my experience the monitoring would check for: merkaba:~> btrfs fi show /home Label: 'home' uuid: […] Total devices 2 FS bytes used 156.31GiB devid1 size 170.00GiB used 164.13GiB path /dev/mapper/msata-home devid2 size 170.00GiB used 164.13GiB path /dev/mapper/sata-home If "used" is same as "size" then make big fat alarm. It is not sufficient for it to happen. It can run for quite some time just fine without any issues, but I never have seen a kworker thread using 100% of one core for extended period of time blocking everything else on the fs without this condition being met. In addition to that last time I tried it aborts scrub any of my BTRFS filesstems. Reported in another thread here that got completely ignored so far. I think I could go back to 4.2 kernel to make this work. I am not going to bother to go into more detail on any on this, as I get the impression that my bug reports and feedback get ignored. So I spare myself the time to do this work for now. Only thing I wonder now whether this all could be cause my /home is already more than one and a half year old. Maybe newly created filesystems are created in a way that prevents these issues? But it already has a nice global reserve: merkaba:~> btrfs fi df / Data, RAID1: total=27.98GiB, used=24.07GiB System, RAID1: total=19.00MiB, used=16.00KiB Metadata, RAID1: total=2.00GiB, used=536.80MiB GlobalReserve, single: total=192.00MiB, used=0.00B Actually when I see that this free space thing is still not fixed for good I wonder whether it is fixable at all. Is this an inherent issue of BTRFS or more generally COW filesystem design? I think it got somewhat better. It took much longer to come into that state again than last time, but still, blocking like this is *no* option for a *production ready* filesystem. I am seriously consider to switch to XFS for my production laptop again. Cause I never saw any of these free space issues with any of the XFS or Ext4 filesystems I used in the last 10 years. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: shall distros run btrfsck on boot?
Am Mittwoch, 25. November 2015, 07:32:34 CET schrieb Austin S Hemmelgarn: > On 2015-11-24 17:26, Eric Sandeen wrote: > > On 11/24/15 2:38 PM, Austin S Hemmelgarn wrote: > >> if the system was > >> shut down cleanly, you're fine barring software bugs, but if it > >> crashed, you should be running a check on the FS. > > > > Um, no... > > > > The *entire point* of having a journaling filesystem is that after a > > crash or power loss, a journal replay on next mount will bring the > > metadata into a consistent state. > > OK, first, that was in reference to BTRFS, not ext4, and BTRFS is a COW > filesystem, not a journaling one, which is an important distinction as > mentioned by Hugo in his reply. Second, there are two reasons that you > should be running a check even of a journaled filesystem when the system > crashes (this also applies to COW filesystems, and anything else that > relies on atomicity of write operations for consistency): > > 1. Disks don't atomically write anything bigger than a sector, and may > not even atomically write the sector itself. This means that it's > possible to get a partial write to the journal, which in turn has > significant potential to put the metadata in an inconsistent state when > the journal gets replayed (IIRC, ext4 has a journal_checksum mount > option that is supposed to mitigate this possibility). This sounds like > something that shouldn't happen all that often, but on a busy > filesystem, the probability is exactly proportionate to the size of the > journal relative to the size of the FS. > > 2. If the system crashed, all code running on it immediately before the > crash is instantly suspect, and you have no way to know for certain that > something didn't cause random garbage to be written to the disk. On top > of this, hardware is potentially suspect, and when your hardware is > misbehaving, then all bets as to consistency are immediately off. In the case of shaky hardware a fsck run can report bogus data, i.e. problems where they are none or vice versa. If I suspect defect memory or controller I would check the device on different hardware only. Especially on attempts to repair any possible issues. -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [4.3-rc4] scrubbing aborts before finishing
Am Samstag, 31. Oktober 2015, 12:10:37 CET schrieb Martin Steigerwald: > Am Donnerstag, 22. Oktober 2015, 10:41:15 CET schrieb Martin Steigerwald: > > I get this: > > > > merkaba:~> btrfs scrub status -d / > > scrub status for […] > > scrub device /dev/mapper/sata-debian (id 1) history > > > > scrub started at Thu Oct 22 10:05:49 2015 and was aborted after > > 00:00:00 > > total bytes scrubbed: 0.00B with 0 errors > > > > scrub device /dev/dm-2 (id 2) history > > > > scrub started at Thu Oct 22 10:05:49 2015 and was aborted after > > 00:01:30 > > total bytes scrubbed: 23.81GiB with 0 errors > > > > For / scrub aborts for sata SSD immediately. > > > > For /home scrub aborts for both SSDs at some time. > > > > merkaba:~> btrfs scrub status -d /home > > scrub status for […] > > scrub device /dev/mapper/msata-home (id 1) history > > > > scrub started at Thu Oct 22 10:09:37 2015 and was aborted after > > 00:01:31 > > total bytes scrubbed: 22.03GiB with 0 errors > > > > scrub device /dev/dm-3 (id 2) history > > > > scrub started at Thu Oct 22 10:09:37 2015 and was aborted after > > 00:03:34 > > total bytes scrubbed: 53.30GiB with 0 errors > > > > Also single volume BTRFS is affected: > > > > merkaba:~> btrfs scrub status /daten > > scrub status for […] > > > > scrub started at Thu Oct 22 10:36:38 2015 and was aborted after > > 00:00:00 > > total bytes scrubbed: 0.00B with 0 errors > > > > No errors in dmesg, btrfs device stat or smartctl -a. > > > > Any known issue? > > I am still seeing this in 4.3-rc7. It happens so that on one SSD BTRFS > doesn´t even start scrubbing. But in the end it aborts it scrubbing anyway. > > I do not see any other issue so far. But I would really like to be able to > scrub my BTRFS filesystems completely again. Any hints? Any further > information needed? > > merkaba:~> btrfs scrub status -d / > scrub status for […] > scrub device /dev/dm-5 (id 1) history > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00 > total bytes scrubbed: 0.00B with 0 errors > scrub device /dev/mapper/msata-debian (id 2) status > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:20 > total bytes scrubbed: 5.27GiB with 0 errors > merkaba:~> btrfs scrub status -d / > scrub status for […] > scrub device /dev/dm-5 (id 1) history > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00 > total bytes scrubbed: 0.00B with 0 errors > scrub device /dev/mapper/msata-debian (id 2) status > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:25 > total bytes scrubbed: 6.59GiB with 0 errors > merkaba:~> btrfs scrub status -d / > scrub status for […] > scrub device /dev/dm-5 (id 1) history > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00 > total bytes scrubbed: 0.00B with 0 errors > scrub device /dev/mapper/msata-debian (id 2) status > scrub started at Sat Oct 31 11:58:45 2015, running for 00:01:25 > total bytes scrubbed: 21.97GiB with 0 errors > merkaba:~> btrfs scrub status -d / > scrub status for […] > scrub device /dev/dm-5 (id 1) history > scrub started at Sat Oct 31 11:58:45 2015 and was aborted after > 00:00:00 total bytes scrubbed: 0.00B with 0 errors > scrub device /dev/mapper/msata-debian (id 2) history > scrub started at Sat Oct 31 11:58:45 2015 and was aborted after > 00:01:32 total bytes scrubbed: 23.63GiB with 0 errors > > > For the sake of it I am going to btrfs check one of the filesystem where > BTRFS aborts scrubbing (which is all of the laptop filesystems, not only > the RAID 1 one). > > I will use the /daten filesystem as I can unmount it during laptop runtime > easily. There scrubbing aborts immediately: > > merkaba:~> btrfs scrub start /daten > scrub started on /daten, fsid […] (pid=13861) > merkaba:~> btrfs scrub status /daten > scrub status for […] > scrub started at Sat Oct 31 12:04:25 2015 and was aborted after > 00:00:00 total bytes scrubbed: 0.00B with 0 errors > > It is single device: > > merkaba:~> btrfs fi sh /daten > Label: 'daten' uuid: […] > Total devices 1 FS bytes used 227.23GiB > devid1 size 230.00GiB used 230.00GiB path > /dev/mapper/msata-daten > > btrfs-progs v4.2.2 > merkaba:~> btrfs fi df /daten > Data, single: total=228.99GiB, used=226.7
Re: [RFC][PATCH 00/12] Enhanced file stat system call
Am Dienstag, 24. November 2015, 00:13:08 CET schrieb Christoph Hellwig: > On Fri, Nov 20, 2015 at 05:19:31PM +0100, Martin Steigerwald wrote: > > I know its mostly relevant for just for FAT32, but on any account rather > > than trying to write 4 GiB and then file, it would be good to at some > > time get a dialog at the beginning of the copy. > > pathconf/fpathconf is supposed to handle that. It's not super pretty > but part of Posix. Linus hates it, but it might be time to give it > another try. It might be interesting for BTRFS as well, to be able to ask what amount of free space there currently is *at* a given path. Cause with BTRFS and Subvolumes this may differ between different paths. Even tough its not implemented yet, it may be possible in the future to have one subvolume with RAID 1 profile and one with RAID 0 profile. That said an application wanting to make sure it can write a certain amount of data can use fallocate. And thats thats the only reliable way to ensure it, I know of. Which can become tedious for several files, but there is no principal problem with preallocating all files if their sizes are known. Even rsync or desktop environments could work like that. First fallocate everything, then, only if that succeeds, start actually copying data. Disadvantage: On aborted copies you have all files with their correct sizes and no easy indicates on where the copy stopped. Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Unclear error message when running btrfs check on a mountpoint
Hi! With kernel 4.3-rc7 and btrfs-progs 4.2.2 I get: merkaba:~> btrfs check /daten Superblock bytenr is larger than device size Couldn't open file system It took me a moment to see that I used a mountpoint and that this may be the reason for the error message. Maybe check for a device file as argument and give a clearer error message in this case? Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: behavior of BTRFS in relation to inodes when moving/copying files between filesystems
Am Dienstag, 13. Oktober 2015, 12:39:12 CET schrieben Sie: > Hi! > > With BTRFS to XFS/Ext4 the inode number of the target file stays the same in > with both cp and mv case (/mnt/zeit is a freshly created XFS in this example): > > merkaba:~> ls -li foo /mnt/zeit/moo > 6609270 foo > 99 /mnt/zeit/moo > merkaba:~> cp foo /mnt/zeit/moo > merkaba:~> ls -li foo /mnt/zeit/moo > 6609270 8 foo > 99 /mnt/zeit/moo > merkaba:~> cp -p foo /mnt/zeit/moo > merkaba:~> ls -li foo /mnt/zeit/moo > 6609270 foo > 99 /mnt/zeit/moo > merkaba:~> mv foo /mnt/zeit/moo > merkaba:~> ls -lid /mnt/zeit/moo > 99 -rw-r--r-- 1 root root 6 Okt 13 12:28 /mnt/zeit/moo > > > With BTRFS as target filesystem however in the mv case I get a new inode: > > merkaba:~> ls -li foo /home/moo > 6609289 -rw-r--r-- 1 root root 6 Okt 13 12:34 foo > 16476276 -rw-r--r-- 1 root root 6 Okt 13 12:34 /home/moo > merkaba:~> cp foo /home/moo > merkaba:~> ls -li foo /home/moo > 6609289 -rw-r--r-- 1 root root 6 Okt 13 12:34 foo > 16476276 -rw-r--r-- 1 root root 6 Okt 13 12:34 /home/moo > merkaba:~> cp -p foo /home/moo > merkaba:~> ls -li foo /home/moo > 6609289 -rw-r--r-- 1 root root 6 Okt 13 12:34 foo > 16476276 -rw-r--r-- 1 root root 6 Okt 13 12:34 /home/moo > merkaba:~> mv foo /home/moo > merkaba:~> ls -li /home/moo > 16476280 -rw-r--r-- 1 root root 6 Okt 13 12:34 /home/moo > > > Is this intentional and/or somehow related to the copy on write specifics of > the filesystem? > > I think even with COW it can just overwrite the existing file instead of > removing the old one and creating a new one – but it wouldn´t give much of a > benefit unless the target file is nocow. > > (Also I thought only certain other utilities had supercow powers, but well > BTRFS seems to have them as well :) Anyone any idea? Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [4.3-rc4] scrubbing aborts before finishing
Am Donnerstag, 22. Oktober 2015, 10:41:15 CET schrieb Martin Steigerwald: > I get this: > > merkaba:~> btrfs scrub status -d / > scrub status for […] > scrub device /dev/mapper/sata-debian (id 1) history > scrub started at Thu Oct 22 10:05:49 2015 and was aborted after > 00:00:00 > total bytes scrubbed: 0.00B with 0 errors > scrub device /dev/dm-2 (id 2) history > scrub started at Thu Oct 22 10:05:49 2015 and was aborted after > 00:01:30 > total bytes scrubbed: 23.81GiB with 0 errors > > For / scrub aborts for sata SSD immediately. > > For /home scrub aborts for both SSDs at some time. > > merkaba:~> btrfs scrub status -d /home > scrub status for […] > scrub device /dev/mapper/msata-home (id 1) history > scrub started at Thu Oct 22 10:09:37 2015 and was aborted after > 00:01:31 > total bytes scrubbed: 22.03GiB with 0 errors > scrub device /dev/dm-3 (id 2) history > scrub started at Thu Oct 22 10:09:37 2015 and was aborted after > 00:03:34 > total bytes scrubbed: 53.30GiB with 0 errors > > Also single volume BTRFS is affected: > > merkaba:~> btrfs scrub status /daten > scrub status for […] > scrub started at Thu Oct 22 10:36:38 2015 and was aborted after > 00:00:00 > total bytes scrubbed: 0.00B with 0 errors > > > No errors in dmesg, btrfs device stat or smartctl -a. > > Any known issue? I am still seeing this in 4.3-rc7. It happens so that on one SSD BTRFS doesn´t even start scrubbing. But in the end it aborts it scrubbing anyway. I do not see any other issue so far. But I would really like to be able to scrub my BTRFS filesystems completely again. Any hints? Any further information needed? merkaba:~> btrfs scrub status -d / scrub status for […] scrub device /dev/dm-5 (id 1) history scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00 total bytes scrubbed: 0.00B with 0 errors scrub device /dev/mapper/msata-debian (id 2) status scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:20 total bytes scrubbed: 5.27GiB with 0 errors merkaba:~> btrfs scrub status -d / scrub status for […] scrub device /dev/dm-5 (id 1) history scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00 total bytes scrubbed: 0.00B with 0 errors scrub device /dev/mapper/msata-debian (id 2) status scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:25 total bytes scrubbed: 6.59GiB with 0 errors merkaba:~> btrfs scrub status -d / scrub status for […] scrub device /dev/dm-5 (id 1) history scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00 total bytes scrubbed: 0.00B with 0 errors scrub device /dev/mapper/msata-debian (id 2) status scrub started at Sat Oct 31 11:58:45 2015, running for 00:01:25 total bytes scrubbed: 21.97GiB with 0 errors merkaba:~> btrfs scrub status -d / scrub status for […] scrub device /dev/dm-5 (id 1) history scrub started at Sat Oct 31 11:58:45 2015 and was aborted after 00:00:00 total bytes scrubbed: 0.00B with 0 errors scrub device /dev/mapper/msata-debian (id 2) history scrub started at Sat Oct 31 11:58:45 2015 and was aborted after 00:01:32 total bytes scrubbed: 23.63GiB with 0 errors For the sake of it I am going to btrfs check one of the filesystem where BTRFS aborts scrubbing (which is all of the laptop filesystems, not only the RAID 1 one). I will use the /daten filesystem as I can unmount it during laptop runtime easily. There scrubbing aborts immediately: merkaba:~> btrfs scrub start /daten scrub started on /daten, fsid […] (pid=13861) merkaba:~> btrfs scrub status /daten scrub status for […] scrub started at Sat Oct 31 12:04:25 2015 and was aborted after 00:00:00 total bytes scrubbed: 0.00B with 0 errors It is single device: merkaba:~> btrfs fi sh /daten Label: 'daten' uuid: […] Total devices 1 FS bytes used 227.23GiB devid1 size 230.00GiB used 230.00GiB path /dev/mapper/msata-daten btrfs-progs v4.2.2 merkaba:~> btrfs fi df /daten Data, single: total=228.99GiB, used=226.79GiB System, single: total=4.00MiB, used=48.00KiB Metadata, single: total=1.01GiB, used=449.50MiB GlobalReserve, single: total=160.00MiB, used=0.00B I do not see any output in btrfs check that points to any issue: merkaba:~> btrfs check /dev/msata/daten Checking filesystem on /dev/msata/daten UUID: 7918274f-e2ec-4983-bbb0-aa93ef95fcf7 checking extents checking free space cache checking fs roots checking csums checking root refs found 243936530607 bytes used err is 0 total csum bytes: 237758932 total tree bytes: 471384064 total fs tree bytes: 116473856 total extent tree bytes: 78544896 btree space waste bytes: 57523323 file data blocks allocated: 422700576768 refer