I'd suggest just "Performance Profiling and Optimization" as a title for the wiki chapter. Profiling is needed just for intelligently setting flags with modern compilers and constantly shifting machine architectures.
I've had time to digest the papers, reread parts of Maros and look at GLPK and my own problem. A few comments: Maros is very good. In particular, I'd like to direct attention to his discussion of sparse arithmetic operations. What he suggests is dead simple to implement and something I'd not seen before. For typical sparse problems it's a big speedup. Much of his discussion about parallel computing is of little interest to me, but for someone without a lot of prior background in parallel computing it would be very beneficial even if a bit dated (ca. 2001) All the papers address a point in time when shared memory multiprocessors were being eclipsed by non-uniform access memory (NUMA) distributed machines. For many reasons, most performance gains must be with NUMA machines. I suggest "Computer Architecture: A Quantitative Approach" by Hennessy and Patterson to any who are interested with the caveat that you'll need to know or learn a lot about hardware to benefit. I plan to order the 5th edition. I read the first 3 when they came out, but missed the 4th. It's a constantly moving target. The reemergence of shared memory symmetric multiprocessing (SMP) in the form of multicore workstations with many gigabytes of memory makes some techniques that had fallen into disfavor worth looking at again. Hall is correct about really large problems, but most intermediate problems can benefit from exploiting SMP provided due attention is paid to cache organization. One quick comment about GPUs. Exploiting attached processors successfully depends upon the ratio of communication and computation latencies. If the labor of exploiting the AP is high, it's often not worth the trouble. The CPU has caught up and passed the AP several times in the last 20 years. Seismic processing codes are littered with coding artifacts from being ported to APs. It can get really difficult to read the code. So I'd leave GPUs to those interested in algorithm research. Fun stuff, but very difficult and demanding. In reading the GLPK source code, it appears that the primal simplex solver is doing dense arithmetic. If I've missed something, please let me know. There are some sparse arithmetic codes, but I don't see them being used anywhere. They also appear to be more complex than what Maros describes which is both elegant and fast. For those solving large sparse problems with the simplex method, there are substantial gains to be had. In my own case I have a large number of dense problems to solve. So neither sparse arithmetic nor multithreading will improve my time to solution. Running multiple job queues as in the example I wrote for the wikibook is the best I can do to exploit multiple cores in my work. Have Fun! Reg _______________________________________________ Help-glpk mailing list [email protected] https://lists.gnu.org/mailman/listinfo/help-glpk
