On Thu, Aug 6, 2015 at 5:26 AM, Sage Weil <sw...@redhat.com> wrote: > Today I learned that syncfs(2) does an O(n) search of the superblock's > inode list searching for dirty items. I've always assumed that it was > only traversing dirty inodes (e.g., a list of dirty inodes), but that > appears not to be the case, even on the latest kernels. >
I checked syncfs code in 3.10/4.1 kernel. I think both kernels only traverse dirty inodes (inodes in bdi_writeback::{b_dirty,b_io,b_more_io} lists). what am I missing? > That means that the more RAM in the box, the larger (generally) the inode > cache, the longer syncfs(2) will take, and the more CPU you'll waste doing > it. The box I was looking at had 256GB of RAM, 36 OSDs, and a load of ~40 > servicing a very light workload, and each syncfs(2) call was taking ~7 > seconds (usually to write out a single inode). > > A possible workaround for such boxes is to turn > /proc/sys/vm/vfs_cache_pressure way up (so that the kernel favors caching > pages instead of inodes/dentries)... > > I think the take-away though is that we do need to bite the bullet and > make FileStore f[data]sync all the right things so that the syncfs call > can be avoided. This is the path you were originally headed down, > Somnath, and I think it's the right one. > > The main thing to watch out for is that according to POSIX you really need > to fsync directories. With XFS that isn't the case since all metadata > operations are going into the journal and that's fully ordered, but we > don't want to allow data loss on e.g. ext4 (we need to check what the > metadata ordering behavior is there) or other file systems. > > :( > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html