On Mon, 26 Mar 2007 11:20:11 +0200 Miklos Szeredi <[EMAIL PROTECTED]> wrote:
> > > > > It also makes a deadlock possible when one filesystem is writing data > > > > > through another, and the balance_dirty_pages() for the lower > > > > > filesystem is stalling the writeback for the upper filesystem's > > > > > data (*). > > > > > > > > I still don't understand this one. I got lost when belatedly told that > > > > i_mutex had something to do with it. > > > > > > This deadlock only happens, if there's some bottleneck for writing > > > data to the lower filesystem. This bottleneck could be > > > > > > - i_mutex, preventing parallel writes to the same inode > > > - limited number of filesystem threads > > > - limited request queue length in the upper filesystem > > > > > > Imagine it this way: balance_dirty_pages() for the lower filesystem is > > > stalling a write() because dirty pages in the upper filesystem are > > > over the limit. Because there's a bottleneck for writing to the lower > > > filesystem, this is stalling _other_ writes from completing. So > > > there's no progress in writing back pages from the upper filesystem. > > > > You mean that someone is stuck in balance_dirty_pages() against the lower > > fs while holding locks which prevent writes into the upper fs from > > succeeding? > > > > Draw us a picture ;) > > Well, not a picture, but a sort of indented call trace: > > [some process, which has a fuse file writably mmaped] > write fault on upper filesystem > balance_dirty_pages > loop... > submit write requests This, I assume, is the upper fs > --------------------------------- > [fuse loopback fs thread 1] > read request from /dev/fuse > sys_write > mutex_lock(i_mutex) > ... > copy data to page cache > balance_dirty_pages > loop ... > submit write requests > write requests completed ... > dirty still over limit ... > ... loop forever > > [fuse loopback fs thread 2] > read request from /dev/fuse > sys_write > mute_lock(i_mutex) blocks And these, I assume, are handling what you term the lower fs. > > The lower filesystem (e.g. ext3) has completed the single write > request that was sent to it, and then it's just looping in > balance_dirty_pages. The upper (fuse) filesystem has all the dirty > data (over the threshold), either still dirty or waiting in the > request queue as writeback. > > Does this help? yup. Interesting problem. I don't suppose that it'd be appreiated if I were to commend the use of O_DIRECT for handling the lower fs ;) Let me think about that a bit, after I've made the latest shitpile people have inflicted upon me begin to look like it has a chance of compiling. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/