2012/3/1 Chris Mason <chris.ma...@oracle.com>: > On Wed, Feb 29, 2012 at 11:44:31PM -0500, Theodore Tso wrote: >> You might try sorting the entries returned by readdir by inode number before >> you stat them. This is a long-standing weakness in ext3/ext4, and it has >> to do with how we added hashed tree indexes to directories in (a) a >> backwards compatible way, that (b) was POSIX compliant with respect to >> adding and removing directory entries concurrently with reading all of the >> directory entries using readdir. >> >> You might try compiling spd_readdir from the e2fsprogs source tree (in the >> contrib directory): >> >> http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=blob;f=contrib/spd_readdir.c;h=f89832cd7146a6f5313162255f057c5a754a4b84;hb=d9a5d37535794842358e1cfe4faa4a89804ed209 >> >> … and then using that as a LD_PRELOAD, and see how that changes things. >> >> The short version is that we can't easily do this in the kernel since it's a >> problem that primarily shows up with very big directories, and using >> non-swappable kernel memory to store all of the directory entries and then >> sort them so they can be returned in inode number just isn't practical. It >> is something which can be easily done in userspace, though, and a number of >> programs (including mutt for its Maildir support) does do, and it helps >> greatly for workloads where you are calling readdir() followed by something >> that needs to access the inode (i.e., stat, unlink, etc.) >> > > For reading the files, the acp program I sent him tries to do something > similar. I had forgotten about spd_readdir though, we should consider > hacking that into cp and tar. > > One interesting note is the page cache used to help here. Picture two > tests: > > A) time tar cf /dev/zero /home > > and > > cp -a /home /new_dir_in_new_fs > unmount / flush caches > B) time tar cf /dev/zero /new_dir_in_new_fs > > On ext, The time for B used to be much faster than the time for A > because the files would get written back to disk in roughly htree order. > Based on Jacek's data, that isn't true anymore.
I've took both on tests. The subject is acp and spd_readdir used with tar, all on ext4: 1) acp: http://91.234.146.107/~difrost/seekwatcher/acp_ext4.png 2) spd_readdir: http://91.234.146.107/~difrost/seekwatcher/tar_ext4_readir.png 3) both: http://91.234.146.107/~difrost/seekwatcher/acp_vs_spd_ext4.png The acp looks much better than spd_readdir but directory copy with spd_readdir decreased to 52m 39sec (30 min less). -Jacek -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html