One of our colleagues wrote:

> Instead of “cat file | tail -n +2”, do “tail -n +2 file”.
>  
> Every one of those “$(…)” creates a subshell, with all the attendant 
> overhead.  Many of those, run in parallel,, may be causing a traffic jam for 
> resources.  Have you tried reducing the number of processes launched in 
> parallel, see if overall performance improves?  If that find command returns 
> hundreds of files, you may be overwhelming your system.  8 to 10 parallel 
> processes seems to be the optimum on most VMs I’ve worked on in the recent 
> past with normal amounts of memory and 2 CPUs.
>  
> It may be disk i/o is your enemy here – that is a kernel process that will 
> put the CPU in a wait state, while it waits for the disk to deliver up the 
> data.  Again, especial with disk i/o, less sometimes gives you more.


Using tail -n +x file removes a subshell. I'll give that a whirl.

Yes, the find command not only finds hundreds of files, it finds 10s of 
thousands of files. 

My shared file system is NFS, and I hear-tell NFS is not very good for parallel 
processing.

Another colleague of ours suggested re-writing it in Perl, Python, Ruby, etc. 

Thanks for the input.

--
Eric Morgan

Reply via email to