I saw that the GNU project is a mentoring organization for this years Google Summer of Code. Would it be possible to extend my work on optimizing sort to this summer?
There are a few ideas on the table: 1) Reducing the amount of time wasted with pthread_join This idea is outlined in the TODO. This would be an optimization off of the patch Paul Eggert and I submitted a few days ago. The general idea is increase the number of threads and use inter-thread communication so that the merge process is more streamlined than it is now. 2) Merge Insertion and List Merge sort This is also outlined in the TODO. I could try my hand at merge insertion and try to see why list merge sort did not improve performance. 3) Integrating our work with the other groups work As you may know, we only parallelized the internal sort algorithm, the other group was in charge of the external sort algorithm. They opted to implement theirs in OpenMP. 4) "In-between sort" As Jim suggested here: http://lists.gnu.org/archive/html/bug-coreutils/2009-02/msg00119.html and P?draig Brady a moment later, we should also try doing something more "UNIXy", where the general idea looks like this: sort -m \ <(sort <(dice -k1,4 INPUT)) \ <(sort <(dice -k2,4 INPUT)) \ <(sort <(dice -k3,4 INPUT)) \ <(sort <(dice -k4,4 INPUT)) \ > out where dice split INPUT into 4 slices "dice -k1,4 INPUT" grabbing the first. I would be willing to work on all 4 of these this summer. These last ten weeks I think I've accustomed myself to the sort code base and these additions seem very doable. Does this sound like a good idea? -- Glen Lenker _______________________________________________ Bug-coreutils mailing list [email protected] http://lists.gnu.org/mailman/listinfo/bug-coreutils
