I've just discovered that a more recent version of coreutils (8.12) than
the one I currently have installed (5.97) has the --parallel option.
However, when I try to sort a large file I don't see any speedup when
using --parallel=8 over --parallel=1. In addition, I only see < 100% CPU
usage. I'm on a 32 core system with 128GB RAM and would like to sort a
stream consisting of several 100million lines in a smaller amount of
time.

 

I'm also investigating GNU parallel, any comments on pros/cons of each?
E.g. does GNU sort parallelise the merge part? My limited experience
with GNU parallel is that it only parallelises the sort but then a
single thread is used to do the merge across all the smaller sorted
files.

 

Cheers,

Nathan



 
Nathan Watson-Haigh
Senior Bioinformatician | The Australian Wine Research Institute
Waite Precinct, Hartley Grove cnr Paratoo Road, Urrbrae (Adelaide) SA 5064 | 
http://www.awri.com.au/contact/map.asp
PO Box 197, Glen Osmond SA 5064, Australia
T: +61 8 83136836 (direct) | T: +61 8 83136600 | F: +61 8 83136601
8 www: http://www.awri.com.au/ | http://www.awri.com.au/events/calendar/

This communication, including attachments, is intended only for the 
addressee(s) and contains information which might be confidential and/or the 
copyright of The Australian Wine Research Institute (AWRI) or a third party. If 
you are not the intended recipient of this communication please immediately 
delete and destroy all copies and contact the sender. If you are the intended 
recipient of this communication you should not copy, disclose or distribute any 
of the information contained herein without the consent of the AWRI and the 
sender. Any views expressed in this communication are those of the individual 
sender except where the sender specifically states them to be the views of the 
AWRI. No representation is made that this communication, including attachments, 
is free of viruses. Virus scanning is recommended and is the responsibility of 
the recipient.

<<image/bmp>>

Reply via email to