On Tue, Nov 05, 2013 at 02:29:10PM +0000, Jonathan Dowland wrote:
> On Tue, Nov 05, 2013 at 03:13:10PM +0400, Reco wrote:
> > find . -type f -name 'popularity-*' -print0 | xargs -0rn 20 rm -f
> 
> I idly wonder (don't know) to what extend find might parallelize the
> unlinks with -delete. A cursory scan of the semantics would suggest it
> could potentially do so: it's not clear that a single unlink failing
> should stop future unlinks (merely spew errors and consider the -delete
> operation as a whole to have failed)

xargs parallelism is optional. The point is that you have one process
which finds files, and another one (or another group of) who are
deleting files. Helps utilizing multiple cpus.


> > Arguably the fastest way to delete all this mess should be
> > 
> > perl -e 'for(<popularity-*>){((stat)[9]<(unlink))}'
> 
> Not sure why loading perl (>1.6M) should be faster than find (~300K)
> and I think '-delete' behaviour is essentially unlink under the hood.

It's not the binary size which matters, it's the algorithm:

$ for x in $(seq 1 500000); do echo somefile > $x; done
$ time perl -e 'for(<*>){(stat)[9]>(unlink))}'

real    0m24.047s
user    0m4.785s
sys     0m16.926s

$ for x in $(seq 1 500000); do echo somefile > $x; done
$ time find -type f -delete

real    4m27.799s
user    0m0.831s
sys     0m17.961s

Basically, the difference is in the fact that find uses fstatat64
syscall for each file, and this perl one-liner uses lstat64 and stat64
syscalls. Use strace to check it in your environment.
On another OS results could be different.

Reco


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20131105151518.GA19598@x101h

Reply via email to