One of our colleagues wrote: > Instead of “cat file | tail -n +2”, do “tail -n +2 file”. > > Every one of those “$(…)” creates a subshell, with all the attendant > overhead. Many of those, run in parallel,, may be causing a traffic jam for > resources. Have you tried reducing the number of processes launched in > parallel, see if overall performance improves? If that find command returns > hundreds of files, you may be overwhelming your system. 8 to 10 parallel > processes seems to be the optimum on most VMs I’ve worked on in the recent > past with normal amounts of memory and 2 CPUs. > > It may be disk i/o is your enemy here – that is a kernel process that will > put the CPU in a wait state, while it waits for the disk to deliver up the > data. Again, especial with disk i/o, less sometimes gives you more.
Using tail -n +x file removes a subshell. I'll give that a whirl. Yes, the find command not only finds hundreds of files, it finds 10s of thousands of files. My shared file system is NFS, and I hear-tell NFS is not very good for parallel processing. Another colleague of ours suggested re-writing it in Perl, Python, Ruby, etc. Thanks for the input. -- Eric Morgan
