Re: [patch] fix up lock order reversal in writeback

2010-11-18 Thread Nick Piggin
On Wed, Nov 17, 2010 at 10:28:34PM -0800, Andrew Morton wrote: On Thu, 18 Nov 2010 17:00:00 +1100 Nick Piggin npig...@kernel.dk wrote: On Wed, Nov 17, 2010 at 07:29:00PM -0800, Andrew Morton wrote: On Wed, 17 Nov 2010 22:06:13 -0500 Ted Ts'o ty...@mit.edu wrote: On Wed, Nov 17,

Re: A little confused about what remains to make a stable release

2010-11-18 Thread Hugo Mills
On Wed, Nov 17, 2010 at 05:46:30PM -0700, Anthony Roberts wrote: It's stable *for you* when it functions with the workloads *you* expect of it, with a failure rate that is acceptable *to you*. I think there's a few ancillary things like a working fsck needed before it can even be

Re: [patch] fix up lock order reversal in writeback

2010-11-18 Thread Theodore Tso
On Nov 18, 2010, at 3:18 AM, Nick Piggin wrote: s_count just prevents it from going away, but s_umount is still needed to keep umount, remount,ro, freezing etc activity away. I don't think there is an easy way to do it. Hmm what about encoding the fact that we are in the process of

Re: Interesting problem with write data.

2010-11-18 Thread Wout Mertens
I think test 2 actually makes dd allocate 1GB of memory and then write it to disk. So if you don't have 1GB free you may be testing your swap. Also, what kernel/btrfs version are you using? In what situations are you experiencing slowness? Wout. On Nov 18, 2010, at 7:19 , Magicloud Magiclouds

Re: ls flush-btrfs-1 sit at 100% sys

2010-11-18 Thread Daniel J Blueman
On 18 November 2010 06:03, Brian Sullivan bexam...@gmail.com wrote: Nothing shows up in dmesg. [ 8114.870020] ls            R  running task        0  3438   3375 0x0004 [ 8114.870020]  88036339dab8 0086 88036339da60 88036339dfd8 [ 8114.870020]  000139c0

Re: Update to Project_ideas wiki page

2010-11-18 Thread Gordan Bobic
Bart Kus wrote: On 11/17/2010 10:07 AM, Gordan Bobic wrote: On 11/17/2010 05:56 PM, Hugo Mills wrote: On Wed, Nov 17, 2010 at 04:12:29PM +0100, Bart Noordervliet wrote: Can I suggest we combine this new RAID level management with a modernisation of the terminology for storage redundancy, as

Re: Interesting problem with write data.

2010-11-18 Thread Tomasz Chmielewski
Recently, I made a btrfs to use. And I met slowness problem. Trying to diag it. I found this: 1. dd if=/dev/zero of=test count=1024 bs=1MB This is fast, at about 25MB/s, and reasonable iowait. 2. dd if=/dev/zero of=test count=1 bs=1GB This is pretty slow, at about 1.5MB/s, and 90%+ iowait,

SI units

2010-11-18 Thread Helmut Hullen
Hallo, linux-btrfs, when I invoke btrfs filesystem show then it shows the size of my Terabyte disks in TiByte but tells TB. It's a difference of about 10% - either there should be a switch like in df (option -H or --si), or TB should be changed to TiB (the same with GiB, MiB etc.)

Re: SI units

2010-11-18 Thread Helmut Hullen
Hallo, Hugo, Du meintest am 18.11.10: when I invoke btrfs filesystem show then it shows the size of my Terabyte disks in TiByte but tells TB. It's a difference of about 10% - either there should be a switch like in df (option -H or --si), or TB should be changed to TiB (the same

Re: SI units

2010-11-18 Thread Hugo Mills
On Thu, Nov 18, 2010 at 02:53:00PM +0100, Helmut Hullen wrote: Du meintest am 18.11.10: when I invoke btrfs filesystem show then it shows the size of my Terabyte disks in TiByte but tells TB. It's a difference of about 10% - either there should be a switch like in df

Re: Interesting problem with write data.

2010-11-18 Thread Chris Mason
Excerpts from Tomasz Chmielewski's message of 2010-11-18 07:03:31 -0500: Recently, I made a btrfs to use. And I met slowness problem. Trying to diag it. I found this: 1. dd if=/dev/zero of=test count=1024 bs=1MB This is fast, at about 25MB/s, and reasonable iowait. 2. dd if=/dev/zero

Re: Update to Project_ideas wiki page

2010-11-18 Thread Bart Noordervliet
On Wed, Nov 17, 2010 at 19:07, Gordan Bobic gor...@bobich.net wrote: Since BTRFS is already doing some relatively radical things, I would like to suggest that RAID5 and RAID6 be deemed obsolete. RAID5 isn't safely usable for arrays bigger than about 5TB with disks that have a specified error

Re: SI units

2010-11-18 Thread Helmut Hullen
Hallo, Hugo, Du meintest am 18.11.10: I posted patches[1] to do just that, a few weeks ago. [...] Do I use an antique version? No, that's the latest version, as far as I know. The patches haven't been picked up and integrated by Chris yet. (In fact, I should probably send them

Re: Interesting problem with write data.

2010-11-18 Thread Tomasz Chmielewski
On 18.11.2010 15:23, Chris Mason wrote: Excerpts from Tomasz Chmielewski's message of 2010-11-18 07:03:31 -0500: Recently, I made a btrfs to use. And I met slowness problem. Trying to diag it. I found this: 1. dd if=/dev/zero of=test count=1024 bs=1MB This is fast, at about 25MB/s, and

Re: Update to Project_ideas wiki page

2010-11-18 Thread Gordan Bobic
Bart Noordervliet wrote: On Wed, Nov 17, 2010 at 19:07, Gordan Bobic gor...@bobich.net wrote: Since BTRFS is already doing some relatively radical things, I would like to suggest that RAID5 and RAID6 be deemed obsolete. RAID5 isn't safely usable for arrays bigger than about 5TB with disks that

Re: Interesting problem with write data.

2010-11-18 Thread Chris Mason
Excerpts from Tomasz Chmielewski's message of 2010-11-18 09:57:34 -0500: On 18.11.2010 15:23, Chris Mason wrote: Excerpts from Tomasz Chmielewski's message of 2010-11-18 07:03:31 -0500: Recently, I made a btrfs to use. And I met slowness problem. Trying to diag it. I found this: 1. dd

Re: Update to Project_ideas wiki page

2010-11-18 Thread Justin Ossevoort
On 18/11/10 15:31, Bart Noordervliet wrote: On Wed, Nov 17, 2010 at 19:07, Gordan Bobic gor...@bobich.net wrote: Since BTRFS is already doing some relatively radical things, I would like to suggest that RAID5 and RAID6 be deemed obsolete. RAID5 isn't safely usable for arrays bigger than about

Re: Poor performance unlinking hard-linked files (repost)

2010-11-18 Thread Chris Mason
Excerpts from Bron Gondwana's message of 2010-11-16 23:11:48 -0500: On Tue, Nov 16, 2010 at 08:38:13AM -0500, Chris Mason wrote: Excerpts from Bron Gondwana's message of 2010-11-16 07:54:45 -0500: Just posting this again more neatly formatted and just the 'meat': a) program creates

Re: Interesting problem with write data.

2010-11-18 Thread Tomasz Chmielewski
On 18.11.2010 16:07, Chris Mason wrote: (...) [27821.906513] btrfs-cache-8 D 88050c5fde98 0 8089 2 0x [27821.906517] 88051c3a9b60 0046 88051c3a9b00 88051c3a9fd8 [27821.906522] 000139c0 000139c0 88051c3a9fd8 88051c3a9fd8

Re: Interesting problem with write data.

2010-11-18 Thread Chris Mason
Excerpts from Tomasz Chmielewski's message of 2010-11-18 10:39:05 -0500: On 18.11.2010 16:07, Chris Mason wrote: (...) [27821.906513] btrfs-cache-8 D 88050c5fde98 0 8089 2 0x [27821.906517] 88051c3a9b60 0046 88051c3a9b00 88051c3a9fd8

Re: Interesting problem with write data.

2010-11-18 Thread Chris Mason
Excerpts from Tomasz Chmielewski's message of 2010-11-18 11:00:16 -0500: On 18.11.2010 16:54, Chris Mason wrote: Excerpts from Tomasz Chmielewski's message of 2010-11-18 10:39:05 -0500: On 18.11.2010 16:07, Chris Mason wrote: (...) [27821.906513] btrfs-cache-8 D 88050c5fde98 0

Re: [patch] fix up lock order reversal in writeback

2010-11-18 Thread Andrew Morton
On Thu, 18 Nov 2010 08:55:18 -0600 Eric Sandeen sand...@redhat.com wrote: Can we just delete writeback_inodes_sb_nr_if_idle() and writeback_inodes_sb_if_idle()? The changelog for 17bd55d037a02 is pretty handwavy - do we know that deleting these things would make a jot of difference?

Re: [patch] fix up lock order reversal in writeback

2010-11-18 Thread Andrew Morton
On Thu, 18 Nov 2010 19:18:22 +1100 Nick Piggin npig...@kernel.dk wrote: On Wed, Nov 17, 2010 at 10:28:34PM -0800, Andrew Morton wrote: Logically I'd expect i_mutex to nest inside s_umount. Because s_umount is a per-superblock thing, and i_mutex is a per-file thing, and files live under

Re: [patch] fix up lock order reversal in writeback

2010-11-18 Thread Eric Sandeen
On 11/18/10 11:10 AM, Andrew Morton wrote: On Thu, 18 Nov 2010 08:55:18 -0600 Eric Sandeen sand...@redhat.com wrote: Can we just delete writeback_inodes_sb_nr_if_idle() and writeback_inodes_sb_if_idle()? The changelog for 17bd55d037a02 is pretty handwavy - do we know that deleting these

Re: A little confused about what remains to make a stable release

2010-11-18 Thread Anthony Roberts
Beyond that, the management capabilities at this point don't look ready for long term use in a production environment. By this I mean adding/removing disks, That much is already there and working. Only for the basics though, yes? Disks can be added, but IIRC you can't really control RAID

Re: [patch] fix up lock order reversal in writeback

2010-11-18 Thread Eric Sandeen
On 11/18/10 12:04 PM, Eric Sandeen wrote: On 11/18/10 11:10 AM, Andrew Morton wrote: On Thu, 18 Nov 2010 08:55:18 -0600 Eric Sandeen sand...@redhat.com wrote: Can we just delete writeback_inodes_sb_nr_if_idle() and writeback_inodes_sb_if_idle()? The changelog for 17bd55d037a02 is pretty

Re: ls flush-btrfs-1 sit at 100% sys

2010-11-18 Thread Brian Sullivan
Yep actually, with noatime,nodiratime ls is fine. I didn't try ro but I assume that'll work too. So with noatime,nodiratime I can go around in tree and ls works. If I try to touch a new file, touch doesn't return. If I then ls in that same folder ls doesn't return either. So yeah seems like

Re: [patch] fix up lock order reversal in writeback

2010-11-18 Thread Chris Mason
Excerpts from Andrew Morton's message of 2010-11-18 01:28:34 -0500: I'm not sure that s_umount versus i_mutex has come up before. Logically I'd expect i_mutex to nest inside s_umount. Because s_umount is a per-superblock thing, and i_mutex is a per-file thing, and files live under

Re: [patch] fix up lock order reversal in writeback

2010-11-18 Thread Chris Mason
Excerpts from Eric Sandeen's message of 2010-11-18 13:24:57 -0500: Um, ok, then, to answer the question directly : No, please don't delete those functions, it will break ENOSPC handling in ext4 as shown by xfstests regression test #204 ... Further - What is going on here is that

Re: [patch] fix up lock order reversal in writeback

2010-11-18 Thread Andrew Morton
On Thu, 18 Nov 2010 12:04:21 -0600 Eric Sandeen sand...@redhat.com wrote: On 11/18/10 11:10 AM, Andrew Morton wrote: On Thu, 18 Nov 2010 08:55:18 -0600 Eric Sandeen sand...@redhat.com wrote: Can we just delete writeback_inodes_sb_nr_if_idle() and writeback_inodes_sb_if_idle()? The

Re: [patch] fix up lock order reversal in writeback

2010-11-18 Thread Chris Mason
Excerpts from Andrew Morton's message of 2010-11-18 13:36:38 -0500: On Thu, 18 Nov 2010 12:04:21 -0600 Eric Sandeen sand...@redhat.com wrote: On 11/18/10 11:10 AM, Andrew Morton wrote: On Thu, 18 Nov 2010 08:55:18 -0600 Eric Sandeen sand...@redhat.com wrote: Can we just delete

Re: [patch] fix up lock order reversal in writeback

2010-11-18 Thread Al Viro
On Wed, Nov 17, 2010 at 10:06:13PM -0500, Ted Ts'o wrote: This makes sense to me as well. Acked-by: Theodore Ts'o ty...@mit.edu So how do we want to send this patch to Linus? It's a writeback change, so through some mm tree? Or it lives in fs/fs-writeback.c (which I always thought was

Re: [patch] fix up lock order reversal in writeback

2010-11-18 Thread Eric Sandeen
On 11/18/10 12:36 PM, Andrew Morton wrote: On Thu, 18 Nov 2010 12:04:21 -0600 Eric Sandeen sand...@redhat.com wrote: On 11/18/10 11:10 AM, Andrew Morton wrote: On Thu, 18 Nov 2010 08:55:18 -0600 Eric Sandeen sand...@redhat.com wrote: Can we just delete writeback_inodes_sb_nr_if_idle() and

Re: [patch] fix up lock order reversal in writeback

2010-11-18 Thread Andrew Morton
On Thu, 18 Nov 2010 13:02:43 -0600 Eric Sandeen sand...@redhat.com wrote: On 11/18/10 12:36 PM, Andrew Morton wrote: On Thu, 18 Nov 2010 12:04:21 -0600 Eric Sandeen sand...@redhat.com wrote: On 11/18/10 11:10 AM, Andrew Morton wrote: On Thu, 18 Nov 2010 08:55:18 -0600 Eric Sandeen

Re: [patch] fix up lock order reversal in writeback

2010-11-18 Thread Andrew Morton
On Thu, 18 Nov 2010 13:51:15 -0500 Chris Mason chris.ma...@oracle.com wrote: If those functions fix a testcase then it was by sheer luck, and the fs's ENOSPC handling is still busted. For a start writeback_inodes_sb_if_idle() is a no-op if the device isn't idle! Secondly, if the

Re: [patch] fix up lock order reversal in writeback

2010-11-18 Thread Chris Mason
Excerpts from Andrew Morton's message of 2010-11-18 15:22:38 -0500: On Thu, 18 Nov 2010 13:51:15 -0500 Chris Mason chris.ma...@oracle.com wrote: If those functions fix a testcase then it was by sheer luck, and the fs's ENOSPC handling is still busted. For a start

Re: [PATCH 1/6] fs: add hole punching to fallocate

2010-11-18 Thread Jan Kara
On Wed 17-11-10 20:46:15, Josef Bacik wrote: Hole punching has already been implemented by XFS and OCFS2, and has the potential to be implemented on both BTRFS and EXT4 so we need a generic way to get to this feature. The simplest way in my mind is to add FALLOC_FL_PUNCH_HOLE to fallocate()

Re: [patch] fix up lock order reversal in writeback

2010-11-18 Thread Jan Kara
On Wed 17-11-10 22:28:34, Andrew Morton wrote: I'm not sure that s_umount versus i_mutex has come up before. Logically I'd expect i_mutex to nest inside s_umount. Because s_umount is a per-superblock thing, and i_mutex is a per-file thing, and files live under superblocks. Nesting s_umount

[PATCH 1/2] btrfs: Check if dest_offset is block-size aligned before cloning file

2010-11-18 Thread Li Zefan
We've done the check for src_offset and src_length, and We should also check dest_offset, otherwise we'll corrupt the destination file: (After cloning file1 to file2 with unaligned dest_offset) # cat /mnt/file2 cat: /mnt/file2: Input/output error Signed-off-by: Li Zefan l...@cn.fujitsu.com

[PATCH 2/2] btrfs: Set file size correctly in file clone

2010-11-18 Thread Li Zefan
Set src_offset = 0, src_length = 20K, dest_offset = 20K. And the original filesize of the dest file 'file2' is 30K: # ls -l /mnt/file2 -rw-r--r-- 1 root root 30720 Nov 18 16:42 /mnt/file2 Now clone file1 to file2, the dest file should be 40K, but it still shows 30K: # ls -l /mnt/file2

Re: Symlinks' device numbers differ from regular files'

2010-11-18 Thread Li Zefan
Toke Høiland-Jørgensen wrote: Hi I am having a problem with my btrfs partitions: symlinks are reported to have different device numbers than directories and regular files, even though they are on the same partition. This causes my backup software to mess up backing up the symlinks. An

[PATCH] Btrfs: fix more ESTALE problems with NFS

2010-11-18 Thread Josef Bacik
When creating new inodes we don't setup inode-i_generation. So if we generate an fh with a newly created inode we save the generation of 0, but if we flush the inode to disk and have to read it back when getting the inode on the server we'll have the right i_generation, so gens wont match and we

tiny btrfs bug.....

2010-11-18 Thread Evert Vorster
Hi there. I'm running my root on a raid1 btrfs partition stretching over two partitions on two separate disks. Each partition is 30GB, so I would expect the raid to be 30GB big, being a mirror and all. In stead, df reports the size of the partition to be about 60GB. For each GB written to the

Re: tiny btrfs bug.....

2010-11-18 Thread Chris Ball
Hi, In stead, df reports the size of the partition to be about 60GB. For each GB written to the partition, 2GB gets used. https://btrfs.wiki.kernel.org/index.php/FAQ#Why_does_df_show_incorrect_free_space_for_my_RAID_volume.3F -- Chris Ball c...@laptop.org One Laptop Per Child -- To

Re: [patch] fix up lock order reversal in writeback

2010-11-18 Thread Nick Piggin
On Thu, Nov 18, 2010 at 09:58:31AM -0800, Andrew Morton wrote: On Thu, 18 Nov 2010 19:18:22 +1100 Nick Piggin npig...@kernel.dk wrote: On Wed, Nov 17, 2010 at 10:28:34PM -0800, Andrew Morton wrote: Logically I'd expect i_mutex to nest inside s_umount. Because s_umount is a

Re: [patch] fix up lock order reversal in writeback

2010-11-18 Thread Nick Piggin
On Fri, Nov 19, 2010 at 01:45:52AM +0100, Jan Kara wrote: On Wed 17-11-10 22:28:34, Andrew Morton wrote: The fact that a call to -write_begin can randomly return with s_umount held, to be randomly released at some random time in the future is a bit ugly, isn't it? write_begin is a pretty

Btrfs_truncate ?

2010-11-18 Thread Smets, Jan (Jan)
Hi list This happened when running an iozone test over ceph, it was doing lots of random reads. I have no idea how to properly interpret this, I should find it out. Let me know if you need something else. Thanks! [69003.803272] [ cut here ] [69003.807987] kernel