Re: I need to P. are we almost there yet?
On 03/01/2015 14:11, Duncan wrote: Bob Marley posted on Sat, 03 Jan 2015 12:34:41 +0100 as excerpted: On 29/12/2014 19:56, sys.syphus wrote: specifically (P)arity. very specifically n+2. when will raid5 raid6 be at least as safe to run as raid1 currently is? I don't like the idea of being 2 bad drives away from total catastrophe. (and yes i backup, it just wouldn't be fun to go down that route.) What about using btrfs on top of MD raid? The problem with that is data integrity. mdraid doesn't have it. btrfs does. If you present a single mdraid device to btrfs and run single mode on it, and one copy on the mdraid is corrupt, mdraid may well simply present it as it does no integrity checking. btrfs will catch and reject that, but because it sees a single device, it'll think the entire thing is corrupt. Which is really not bad, considering the chance that something gets corrupt. Already it is an exceedingly rare event. Detection without correction can be more than enough. Since always things have worked in the computer science field without even the detection feature. Most likely even your bank account and mine are held in databases which are located in filesystems or blockdevices which do not even have the corruption detection feature. And, last but not least, as of now a btrfs bug is more likely than hard disks' silent data corruption. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I need to P. are we almost there yet?
On 29/12/2014 19:56, sys.syphus wrote: specifically (P)arity. very specifically n+2. when will raid5 raid6 be at least as safe to run as raid1 currently is? I don't like the idea of being 2 bad drives away from total catastrophe. (and yes i backup, it just wouldn't be fun to go down that route.) What about using btrfs on top of MD raid? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: device balance times
On 22/10/2014 14:40, Piotr Pawłow wrote: On 22.10.2014 03:43, Chris Murphy wrote: On Oct 21, 2014, at 4:14 PM, Piotr Pawłowp...@siedziba.pl wrote: Looks normal to me. Last time I started a balance after adding 6th device to my FS, it took 4 days to move 25GBs of data. It's long term untenable. At some point it must be fixed. It's way, way slower than md raid. At a certain point it needs to fallback to block level copying, with a ~ 32KB block. It can't be treating things as if they're 1K files, doing file level copying that takes forever. It's just too risky that another device fails in the meantime. There's device replace for restoring redundancy, which is fast, but not implemented yet for RAID5/6. Device replace on raid 0,1,10 works if the device to be replaced is still alive, otherwise the operation is as long as a rebalance and works similarly (AFAIR). Which is way too long in terms of the likelihood of another disk failing. Additionally, it seeks like crazy during the operation, which also greatly increases the likelihood of another disk failing. Until this is fixed I am not confident in using btrfs on a production system which requires RAID redundancy. The operation needs to be streamlined: it should be as sequential as possible (sort files according to their LBA before reading/writing), with the fewest number of seeks on every disk, and with large buffers, so that reads from the source disk(s) and writes to the replacement disk goes at platter-speed or near there. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What is the vision for btrfs fs repair?
On 10/10/2014 03:58, Chris Murphy wrote: * mount -o recovery Enable autorecovery attempts if a bad tree root is found at mount time. I'm confused why it's not the default yet. Maybe it's continuing to evolve at a pace that suggests something could sneak in that makes things worse? It is almost an oxymoron in that I'm manually enabling an autorecovery If true, maybe the closest indication we'd get of btrfs stablity is the default enabling of autorecovery. No way! I wouldn't want a default like that. If you think at distributed transactions: suppose a sync was issued on both sides of a distributed transaction, then power was lost on one side, than btrfs had corruption. When I remount it, definitely the worst thing that can happen is that it auto-rolls-back to a previous known-good state. Now if I can express wishes: I would like an option that spits out all the usable tree roots (or what's the name, superblocks?) and not just the newest one which is corrupt. And then another option that lets me mount *readonly* starting from the tree root I specify. So I can check how much of the data is still there. After I decide that such tree root is good, I need another option that lets me mount with such tree root in readwrite mode, and obviously eliminating all tree roots newer than that. Some time ago I read that mounting the filesystem with an earlier tree root was possible, but only by manually erasing the disk regions in which the newer superblocks are. This is crazy, it's too risky on too many levels, and also as I wrote I want to check what data is available on a certain tree root before mounting readwrite from that one. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What is the vision for btrfs fs repair?
On 10/10/2014 12:59, Roman Mamedov wrote: On Fri, 10 Oct 2014 12:53:38 +0200 Bob Marley bobmar...@shiftmail.org wrote: On 10/10/2014 03:58, Chris Murphy wrote: * mount -o recovery Enable autorecovery attempts if a bad tree root is found at mount time. I'm confused why it's not the default yet. Maybe it's continuing to evolve at a pace that suggests something could sneak in that makes things worse? It is almost an oxymoron in that I'm manually enabling an autorecovery If true, maybe the closest indication we'd get of btrfs stablity is the default enabling of autorecovery. No way! I wouldn't want a default like that. If you think at distributed transactions: suppose a sync was issued on both sides of a distributed transaction, then power was lost on one side What distributed transactions? Btrfs is not a clustered filesystem[1], it does not support and likely will never support being mounted from multiple hosts at the same time. [1]http://en.wikipedia.org/wiki/Clustered_file_system This is not the only way to do a distributed transaction. Databases can be hosted on the filesystem, and those can do distributed transations. Think of two bank accounts, one on btrfs fs1 here, and another bank account on database on a whatever filesystem in another country. You want to debit one account and credit the other one: the filesystems at the two sides *must not rollback their state* !! (especially not transparently without human intervention) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What is the vision for btrfs fs repair?
On 10/10/2014 16:37, Chris Murphy wrote: The fail safe behavior is to treat the known good tree root as the default tree root, and bypass the bad tree root if it cannot be repaired, so that the volume can be mounted with default mount options (i.e. the ones in fstab). Otherwise it's a filesystem that isn't well suited for general purpose use as rootfs let alone for boot. A filesystem which is suited for general purpose use is a filesystem which honors fsync, and doesn't *ever* auto-roll-back without user intervention. Anything different is not suited for database transactions at all. Any paid service which has the users database on btrfs is going to be at risk of losing payments, and probably without the company even knowing. If btrfs goes this way I hope a big warning is written on the wiki and on the manpages telling that this filesystem is totally unsuitable for hosting databases performing transactions. At most I can suggest that a flag in the metadata be added to allow/disallow auto-roll-back-on-error on such filesystem, so people can decide the tolerant vs. transaction-safe mode at filesystem creation. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Performance reduces with nodatasum
Hello, apparently I have found an issue with btrfs: performance reduces with nodatasum and multi-device raid0 or single. I was testing with a series of 8 LIO ramdisks, with btrfs on those in multi-device single mode, and writing zeroes on the filesystem with 16 dd in parallel. Performance decreases significantly if the filesystem is mounted with nodatasum, or with nodatacow which implies nodatasum. CPU occupation also reduces, together with speed, as seen with htop. At first I thought it was my problem, but then I saw this web page http://www.linux-mag.com/id/7308/3/ which also reports reduced performance with nodatasum and multi-device raid0 or single, e.g. see these two lines: Btrfs two disks, single standard 50.14450.264126.984131.130 Btrfs two disks, single nodatacow, nodatasum 43.83447.603131.612131.470 similarly with raid0 even more with compression: Btrfs two disks, raid0 -o compress 70.23469.048130.852129928 Btrfs two disks, raid0 -o compress nodatacow, nodatasum 48.76248.831130.812130.202 If you go higher with the performances, such as with ramdisks, in the GB/sec range, it reduces more than that. I have noticed upto 50% reduction. It would be important to fix this problem for high-performance usages of btrfs. Best regards BM -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Performance reduces with nodatasum
On 04/10/2014 12:26, Bob Marley wrote: Hello, apparently I have found an issue with btrfs Sorry I forgot to mention the kernel version: 3.14.19 not tested with higher versions -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Performance reduces with nodatasum
On 04/10/2014 12:36, Bob Marley wrote: On 04/10/2014 12:26, Bob Marley wrote: Hello, apparently I have found an issue with btrfs Sorry I forgot to mention the kernel version: 3.14.19 not tested with higher versions I just noticed that the page I have linked which also reports the problem http://www.linux-mag.com/id/7308/3/ is dated April 21st, 2009 , with kernel version 2.6.30-rc1 so this problem is not a recent regression but has been there probably since always. So it's likely that it's present also in latest 3.17-rc7 even if I can't check right now. Best regards BM -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
On 20/07/2014 10:45, TM wrote: Hi, I have a raid10 with 4x 3TB disks on a microserver http://n40l.wikia.com/wiki/Base_Hardware_N54L , 8Gb RAM Recently one disk started to fail (smart errors), so I replaced it Mounted as degraded, added new disk, removed old Started yesterday I am monitoring /var/log/messages and it seems it will take a long time Started at about 8010631739392 And 20 hours later I am at 6910631739392 btrfs: relocating block group 6910631739392 flags 65 At this rate it will take a week to complete the raid rebuild!!! Furthermore it seems that the operation is getting slower and slower When the rebuild started I had a new message every half a minute, now it’s getting to OneAndHalf minutes Most files are small files like flac/jpeg Hi TM, are you doing other significant filesystem activity during this rebuild, especially random accesses? This can reduce performances a lot on HDDs. E.g. if you were doing strenous multithreaded random writes in the meanwhile, I could expect even less than 5MB/sec overall... -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Especially broken btrfs
Hi, I hadn't noticed this post, I think I know the reason this time : you have used USB you bad guy! I think USB does not support flush / barrier , which is mandatory for BTRFS to work correctly in case of power loss. For most filesystems actually, but the damages suffered by COW filesystems such as btrfs are much more severe than for static filesystems such as ext4 . Please check if when you connect the USB drive you see in dmesg something like: |[ . .] sd ...:0:0:0: [sdf] Write cache: ., read cache: ., doesn't support DPO or FUA | Regards BM On 21/03/2014 04:21, sepero...@gmx.com wrote: Hello all. I submit bugs to different foss projects regularly, but I don't really have a bug report this time. I have a broken filesystem to report. And I have no idea how to reproduce it. I am including a link to the filesystem itself, because it appears to be unrepairable and unrestorable. I have no personal information on the disk image. The filesystem is almost 512MB uncompressed. I was using it on an old usb drive with 512MB size limitation. I only used (abused?) it about 2 days before this corruption. My goal was to use the usb as a bootable rescue system. I decided to try Btrfs instead of Ext4, because it supports filesystem compression. BTRFS IMAGE LINK (please pardon my file hosting service) http://www.mediafire.com/download/gdaydt3mz8uwtmm/sdb1.btrfs.xz These are some things that may have helped to cause the corruption. +Created btrfs with -M flag +Installed Debian testing/unstable +When mounting, I always used at least these options: ssd_spread,noatime,compression=zlib,autodefrag +Occasionally force powering off computer. +While booted into usb system, I was constantly running out of space while trying to install new packages. It is my hope that this image might be used to improve the btrfs restore and btrfsck tools. Please let me know if I can provide any further information. Big thanks to everyone helping to further development of Btrfs. Sepero -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs and ECC RAM
On 20/01/2014 15:57, Ian Hinder wrote: i.e. that there is parity information stored with every piece of data, and ZFS will correct errors automatically from the parity information. So this is not just parity data to check correctness but there are many more additional bits to actually correct these errors, based on an algorithm like reed-solomon ? Where can I find information on how much parity is stored in ZFS ? I start to suspect that there is confusion here between checksumming for data integrity and parity information. If this is really how ZFS works, then if memory corruption interferes with this process, then I can see how a scrub could be devastating. I don't . If you have additional bits to correct errors (other than detect errors), this will never be worse than having less of them. All algorithms I know of, don't behave any worse if the erroneous bits are in the checksum part, or if the algorithm is correct+detect instead of just detect. If the algorithm stores X+2Y extra bits (supposed ZFS case) in order to detectcorrect Y erroneous bits and detect additional X erroneous bits, this will not be worse than having just X checksum bits (btrfs case). So does ZFS really uses detectcorrect parity? I'd expect this to be quite a lot computationally expensive I don't know if ZFS really works like this. It sounds very odd to do this without an additional checksum check. This sounds very different to what you say below that btrfs does, which is only to check against redundantly-stored copies, which I agree sounds much safer. The second link above from the ZFS FAQ just says that if you place a very high value on data integrity, you should be using ECC memory anyway, which I'm sure we can all agree with. hxxp://zfsonlinux.org/faq.html#DoIHaveToUseECCMemory: 1.16 Do I have to use ECC memory for ZFS? Using ECC memory for ZFS is strongly recommended for enterprise environments where the strongest data integrity guarantees are required. Without ECC memory rare random bit flips caused by cosmic rays or by faulty memory can go undetected. If this were to occur ZFS (or any other filesystem) will write the damaged data to disk and be unable to automatically detect the corruption. The above sentence imho means that the data can get corrupted just prior to its first write. This is obviously applicable to every filesystem on earth, without ECC, especially if it happens prior to the computation of the parity. BM -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
On 19/10/2013 16:03, Stefan Behrens wrote: On 10/19/2013 12:32, Shilong Wang wrote: Yeah, it did not hurt. but it may output checksum mismatch. For example: Writing 4k superblock is not totally finished, but we are trying to scrub it. Have you ever seen this issue? ... If this is really an issue and these 4K disk writes and reads interfere, let's find a better solution please. Why don't you scrub optimistically as is now, and then just in case of checksum mismatch you re-scrub in transaction context? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid6: rmw writes all the time?
On 23/05/2013 15:22, Bernd Schubert wrote: Yeah, I know and I'm using iostat already. md raid6 does not do rmw, but does not fill the device queue, afaik it flushes the underlying devices quickly as it does not have barrier support - that is another topic, but was the reason why I started to test btrfs. MD raid6 DOES have barrier support! -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS, getting darn slower everyday
On 12/09/12 12:38, Hugo Mills wrote: On Sun, Dec 09, 2012 at 12:20:46PM +0100, Swâmi Petaramesh wrote: Le 09/12/2012 11:41, Roman Mamedov a écrit : CoW filesystem incurs fragmentation by its nature, not specifically snapshots. Even without snapshots, rewriting portions of existing files will write the new blocks not over the original ones, but elsewhere, thus increasing fragmentation. Is it to expect that somewhere in the future, BTRFS will be able to defragment itself without duplicating snapshot data ? In the presence of snapshots that are modified, no, it's impossible to fully defrag all the files. Of course, but I would agree with the poster that it would be important to partially defrag all the files, avoiding at least unneeded fragmentation... At least the fragmentation generated by normal writes -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
High-sensitivity fs checker (not repairer) for btrfs
Hello all I would like to know if there exists a tool to check the btrfs filesystem very thoroughly. It's ok if it needs the FS unmounted to operate. Also mounted is OK. It does not need repair capability It needs very good checking capability: it has to return Good / Bad status with the Bad meaning that there is at least ONE inconsistency. Good means that it is really really 100% consistent. Does something like this exists? We need to detect as much ahead of time as possible if the btrfs filesystem has become even just a little bit inconsistent Thank you -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: High-sensitivity fs checker (not repairer) for btrfs
On 11/10/12 22:23, Hugo Mills wrote: The closest thing is btrfsck. That's about as picky as we've got to date. What exactly is your use-case for this requirement? We need a decently-available system. We can rollback filesystem to last-known-good if the test detects an inconsistency on current btrfs filesystem, but we need a very good test for that (i.e. if last-known-good is actually bad we get into serious troubles). So do you think btrfsck can return a false OK result? can it not-see an inconsistency? Thank you -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Systemcall for offline deduplication
Hello all btrfs developers I would really appreciate a systemcall (or ioctl or the like) to allow deduplication of a block of a file against a block of another file. (ok if blocks need to be aligned to filesystem blocks) So that if I know that bytes 32768...65536 of FileA are identical to bytes 131072...163840 of FileB I can call that syscall to have the regions deduplicated one against the other atomically and with the filesystem running. The syscall should presumably check that the regions are really equal and perform the deduplication atomically. This would be the start for a lot of deduplication algorithms in userspace. It would be a killer feature for backup systems. Thank you, Bob -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html