2012/3/1 Chris Mason <chris.ma...@oracle.com>:
> On Wed, Feb 29, 2012 at 11:44:31PM -0500, Theodore Tso wrote:
>> You might try sorting the entries returned by readdir by inode number before 
>> you stat them.    This is a long-standing weakness in ext3/ext4, and it has 
>> to do with how we added hashed tree indexes to directories in (a) a 
>> backwards compatible way, that (b) was POSIX compliant with respect to 
>> adding and removing directory entries concurrently with reading all of the 
>> directory entries using readdir.
>>
>> You might try compiling spd_readdir from the e2fsprogs source tree (in the 
>> contrib directory):
>>
>> http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=blob;f=contrib/spd_readdir.c;h=f89832cd7146a6f5313162255f057c5a754a4b84;hb=d9a5d37535794842358e1cfe4faa4a89804ed209
>>
>> … and then using that as a LD_PRELOAD, and see how that changes things.
>>
>> The short version is that we can't easily do this in the kernel since it's a 
>> problem that primarily shows up with very big directories, and using 
>> non-swappable kernel memory to store all of the directory entries and then 
>> sort them so they can be returned in inode number just isn't practical.   It 
>> is something which can be easily done in userspace, though, and a number of 
>> programs (including mutt for its Maildir support) does do, and it helps 
>> greatly for workloads where you are calling readdir() followed by something 
>> that needs to access the inode (i.e., stat, unlink, etc.)
>>
>
> For reading the files, the acp program I sent him tries to do something
> similar.  I had forgotten about spd_readdir though, we should consider
> hacking that into cp and tar.
>
> One interesting note is the page cache used to help here.  Picture two
> tests:
>
> A) time tar cf /dev/zero /home
>
> and
>
> cp -a /home /new_dir_in_new_fs
> unmount / flush caches
> B) time tar cf /dev/zero /new_dir_in_new_fs
>
> On ext, The time for B used to be much faster than the time for A
> because the files would get written back to disk in roughly htree order.
> Based on Jacek's data, that isn't true anymore.

I've took both on tests. The subject is acp and spd_readdir used with
tar, all on ext4:
1) acp: http://91.234.146.107/~difrost/seekwatcher/acp_ext4.png
2) spd_readdir: http://91.234.146.107/~difrost/seekwatcher/tar_ext4_readir.png
3) both: http://91.234.146.107/~difrost/seekwatcher/acp_vs_spd_ext4.png

The acp looks much better than spd_readdir but directory copy with
spd_readdir decreased to 52m 39sec (30 min less).

-Jacek
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to