Am 14.02.2012 10:57, schrieb Joerg Schilling:
> Florian Philipp <[email protected]> wrote:
> 
>>> Even if the i-nodes are sequential on-disk, there's no reason to think
>>> that the data blocks associated with the inodes are in any particular
>>> order with respect to the i-nodes themselves.
>>
>> You could probably find the intended order by using debugfs (at least
>> for ext*). The following command should output the first physical block
>> of every file:
>> find /var/db/portage/ -type f -printf 'bmap <%i> 0\n' | sudo debugfs
>> /dev/mapper/vg-portage
> 
> This kind of order is not important for copy speed.
> 
> Copy speed is dominated by write speed and write speed is dominated by seeks 
> that are a result of keeping meta data up to date.
> 
> Jörg
> 

I cannot verify that hypothesis.

Test setup:
1x 7200rpm 2,5" HDD
/var/db/portage is my portage tree, ext4
/dev/mapper/vg-portage is its block device
/tmp is ext4

First test --- copy whole tree just with `cpio` (performance tested and
similar to `cp -a`):
$ echo 1 >/proc/sys/vm/drop_caches
$ time find /var/db/portage/ -type f -print0 |
$ cpio -p0 --make-directories /tmp/portage/

real    11m52.657s
user    0m1.848s
sys     0m19.802s

Second test --- Sort by starting physical block number:
$ echo 1 >/proc/sys/vm/drop_caches
$ FIFO=/tmp/$(uuidgen).fifo
$ mkfifo "$FIFO"
$ time find /var/db/portage/ -type f \
$       -fprintf "$FIFO" 'bmap <%i> 0\n' -print0 |
$ tr '\n\0' '\0\n' | paste <(
$       debugfs -f "$FIFO" /dev/mapper/vg-portage |
$       grep -E '^[[:digit:]]+') - |
$ sort -k 1,1n | cut -f 2- | tr '\n\0' '\0\n' |
$ cpio -p0 --make-directories /tmp/portage/
$ unlink "$FIFO"

real    2m8.400s
user    0m1.888s
sys     0m15.417s

Using `xargs -0 cat >/dev/null` instead of `cpio` yields 9m27.745s and
1m11.087s, respectively.

Some comments to the sorting script:
- Using a fifo instead of a pipe for issuing commands to debugfs is faster.
- If it is not obvious, the two `tr` commands are there because `paste`
and `cut` cannot handle zero-terminated lines but file names might
contain line breaks.
- `grep` is there because `debugfs` echoes all commands. Filtering every
odd numbered line should also work.
- A production-ready script should probably use `join` instead of
`paste` to deal with read errors of `debugfs` (for example if files are
removed between `find` and `debugfs`). Currently, this leads to
misaligned output.

BTW: I wanted to test it with `star -copy` but this resulted in buffer
overflows similar to these:
http://permalink.gmane.org/gmane.comp.archivers.star.user/752

Regards,
Florian Philipp

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to