[Help-glpk] Optimization and Multicore GLPK

Reginald Beardsley Mon, 24 Dec 2012 17:37:31 -0800

Here are a few comments on optimization and multicore GLPK.  A search online 
for information about optimizing GLPK only turned up xypron's recent post about 
enabling SSE and fastmath.  However, GLPK is very sophisticated, so I what I 
have to say may fall in the "not documented yet" category.  However, except for 
Maros, I've seen little mention of performance profiling and coverage analysis 
in the LP literature I've read.  Most of the performance metrics are limited to 
iteration counts and elapsed time.   Serious code optimization requires much 
finer granularity of analysis.


I have just completed an initial, very quick read of "Computational Techniques 
of the Simplex Method" by Istvan Maros.  For anyone interested in working on LP 
codes, I think it is a necessary, but not sufficient reference. Maros gives a 
good overview of many issues related to implementing a simplex code and to 
large scale solvers in general.

It was typeset by the author so there are typos and non-native speaker errors 
to contend with.  However, all the important ideas are accompanied by numerous 
citations, so with the addition of the appropriate references, it appears to 
provide a good map of the terrain.  I qualify my statement only because I am 
far too ignorant of LP to make a stronger statement.

I agree entirely with Andrew's comments previously on the topic of 
parallelization of GLPK. There is much work to be done on the serial algorithms 
before it makes any sense to attempt to implement parallel execution.  Aside 
from the computational cost of pricing, the particular choice of pricing rule 
can strongly influence solution times for particular problems.  So implementing 
Devex and other rules seems to me a good starting point.  I'm sure that other 
opportunities to improve single core performance exist, but this seems a 
particularly obvious place to start.

That said, barrier synchronized shared memory parallel programming should be 
suitable for use in GLPK running on multicore workstations. Actual performance 
is highly dependent upon cache architecture and organization, so the benefit 
cannot be predicted easily.  Considerable experimentation and deep knowledge of 
the hardware behavior is required to make this work well.  The only reason it 
is worth considering at all is that a single architecture dominates the 
computing landscape.  Intel is unlikely to do something that breaks a lot of 
major codes and core counts are likely to continue to grow.

The concept is simple.  In sections which are not execution order dependent, a 
small number of coroutines are started by a signal.  These coroutines then run 
to completion at which time they send a signal to the main routine which 
decrements a counter.  When all the coroutines have completed, the counter 
decrements to zero and the main routine continues.  If one can avoid thrashing 
due to cache collisions, those portions of the code can be speeded up by the 
number of cores employed.  Of course, total speedup will be much less as 
dictated by Amdahl's law.

In looking at some of the source code for simplex, I see places where there is 
a loop over vector operations. The natural thing to do for these is to use the 
SSE instructions to speed up the vector operations and spread the vectors over 
multiple cores.

Though conceptually simple, none of this is easy to do.  In addition, GLPK is 
easily the most complex code I've ever looked at in 20+ years of working on 
several million lines of scientific codes.  Fortunately, it's also hands down 
the best written large code I've ever seen which is a real delight.

Any optimization work on a code as sophisticated as GLPK is a major undertaking 
which will take a long time to execute.  The first step is profiling and 
coverage analysis.

If there is sufficient interest in this subject, I propose to implement and 
document performance profiling and coverage analysis of GLPK in the wikibook 
using the various Netlib problem sets.  This will be for convenience restricted 
to the *nix environment.  However, it should generally be applicable to Windows 
if a *nix environment package such as Cygwin is used.  

I am particularly interested in comments from Andrew, Marc, Robbie and xypron.  
All have been very generous to me in different ways and this is an attempt to 
repay them.  I come from a seismic processing background where run times are 
measured in tens of thousands of hours for a single dataset.  Fortunately, most 
seismic codes are trivially parallel. So a few hundred quad core nodes and a 
couple of weeks will get the job done.  Were that not the case, oil would be a 
lot more expensive than it is.  But we still do a lot of work to make the codes 
run faster.

Have Fun!
Reg



_______________________________________________
Help-glpk mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/help-glpk

[Help-glpk] Optimization and Multicore GLPK

Reply via email to