[Help-glpk] [Fwd: Re: Optimization and Multicore GLPK]

Andrew Makhorin Tue, 25 Dec 2012 02:12:53 -0800

-------- Forwarded Message --------
From: Vladimir Dergachev <[email protected]>
To: glpk xypron <[email protected]>
Cc: Reginald Beardsley <[email protected]>, [email protected]
Subject: Re: [Help-glpk] Optimization and Multicore GLPK
Date: Tue, 25 Dec 2012 01:03:13 -0800 (PST)



On Tue, 25 Dec 2012, glpk xypron wrote:

> Hello Reg,
>
> == Profiling ==
>
> Profiling GLPK could give very valuable ideas how to speed up the code.
>
> Oprofile is a useful tool for doing the work. See
> http://oprofile.sourceforge.net

There is also "perf" which newer kernels support.

>
> == Parallelization ==
>
> Before thinking about parellization it would be necessary to attack the very
> same design decisions that make GLPK not thread safe: memory management and
> error handling.

In my experience, for compute-intensive codes, it is not necessary to make 
memory management to be too nice to multithreaded code - a simple global 
lock will suffice.

This is because a well optimized compute intensive code rarely (if ever) 
calls memory management functions, as they are rather slow. For quick 
snippets of memory, one can use alloca().

>
> When it comes to question whether parallization or optimiation of the single
> thread code is more valuable I must admit that I do not care as long as I can
> get back to the entry prompt faster. So if parallelization can give me factor
> 4 and a better algorithm factor 25, I would be happy to get both and solve
> my problem in 1 % of the time.
>

There is also a possibility of using SSE or newer AVX to achieve much 
higher throughput. Also, upcoming Xeon PHI looks very attractive for 
high-throughput computation.

                     best

                       Vladimir Dergachev

> There are different levels where parallelization might be considered for GLPK.
>
> One place is the simplex algorithm itself. An overview can be found in
> http://www.maths.ed.ac.uk/hall/ParSimplexRevised/ParSimplexApril2007.pdf
> One direction of research not covered here are attempts to run the 
> Simplex algorithm on GPUs.
>
> Another place for parallelization is the branch and bound algorithm as
> described in
> http://web.ist.utl.pt/~ist11038/compute/_parallel_comp/Montana2007ParalMILP.pdf
> http://www.cc.gatech.edu/~bader/papers/ParallelBranchBound.pdf
>
> My impression is that parallelization of branch and bound in GLPK would 
> require
> significantly less programming resources.
>
> Best regards
>
> Heinrich Schuchardt
>
> -------- Original-Nachricht --------
>> Datum: Mon, 24 Dec 2012 17:37:22 -0800 (PST)
>> Betreff: [Help-glpk] Optimization and Multicore GLPK
>
>> Here are a few comments on optimization and multicore GLPK.  A search
>> online for information about optimizing GLPK only turned up xypron's recent
>> post about enabling SSE and fastmath.  However, GLPK is very sophisticated,
>> so I what I have to say may fall in the "not documented yet" category. 
>> However, except for Maros, I've seen little mention of performance profiling
>> and coverage analysis in the LP literature I've read.  Most of the
>> performance metrics are limited to iteration counts and elapsed time.   
>> Serious
>> code optimization requires much finer granularity of analysis.
>> 
>> I have just completed an initial, very quick read of "Computational
>> Techniques of the Simplex Method" by Istvan Maros.  For anyone interested in
>> working on LP codes, I think it is a necessary, but not sufficient reference.
>> Maros gives a good overview of many issues related to implementing a
>> simplex code and to large scale solvers in general.
>> 
>> It was typeset by the author so there are typos and non-native speaker
>> errors to contend with.  However, all the important ideas are accompanied by
>> numerous citations, so with the addition of the appropriate references, it
>> appears to provide a good map of the terrain.  I qualify my statement only
>> because I am far too ignorant of LP to make a stronger statement.
>> 
>> I agree entirely with Andrew's comments previously on the topic of
>> parallelization of GLPK. There is much work to be done on the serial 
>> algorithms
>> before it makes any sense to attempt to implement parallel execution.  Aside
>> from the computational cost of pricing, the particular choice of pricing
>> rule can strongly influence solution times for particular problems.  So
>> implementing Devex and other rules seems to me a good starting point.  I'm
>> sure that other opportunities to improve single core performance exist, but
>> this seems a particularly obvious place to start.
>> 
>> That said, barrier synchronized shared memory parallel programming should
>> be suitable for use in GLPK running on multicore workstations. Actual
>> performance is highly dependent upon cache architecture and organization, so 
>> the
>> benefit cannot be predicted easily.  Considerable experimentation and
>> deep knowledge of the hardware behavior is required to make this work well. 
>> The only reason it is worth considering at all is that a single
>> architecture dominates the computing landscape.  Intel is unlikely to do 
>> something
>> that breaks a lot of major codes and core counts are likely to continue to
>> grow.
>> 
>> The concept is simple.  In sections which are not execution order
>> dependent, a small number of coroutines are started by a signal.  These
>> coroutines then run to completion at which time they send a signal to the 
>> main
>> routine which decrements a counter.  When all the coroutines have completed,
>> the counter decrements to zero and the main routine continues.  If one can
>> avoid thrashing due to cache collisions, those portions of the code can be
>> speeded up by the number of cores employed.  Of course, total speedup will
>> be much less as dictated by Amdahl's law.
>> 
>> In looking at some of the source code for simplex, I see places where
>> there is a loop over vector operations. The natural thing to do for these is 
>> to
>> use the SSE instructions to speed up the vector operations and spread the
>> vectors over multiple cores.
>> 
>> Though conceptually simple, none of this is easy to do.  In addition,
>> GLPK is easily the most complex code I've ever looked at in 20+ years of
>> working on several million lines of scientific codes.  Fortunately, it's also
>> hands down the best written large code I've ever seen which is a real
>> delight.
>> 
>> Any optimization work on a code as sophisticated as GLPK is a major
>> undertaking which will take a long time to execute.  The first step is 
>> profiling
>> and coverage analysis.
>> 
>> If there is sufficient interest in this subject, I propose to implement
>> and document performance profiling and coverage analysis of GLPK in the
>> wikibook using the various Netlib problem sets.  This will be for convenience
>> restricted to the *nix environment.  However, it should generally be
>> applicable to Windows if a *nix environment package such as Cygwin is used.  
>> 
>> I am particularly interested in comments from Andrew, Marc, Robbie and
>> xypron.  All have been very generous to me in different ways and this is an
>> attempt to repay them.  I come from a seismic processing background where run
>> times are measured in tens of thousands of hours for a single dataset. 
>> Fortunately, most seismic codes are trivially parallel. So a few hundred quad
>> core nodes and a couple of weeks will get the job done.  Were that not the
>> case, oil would be a lot more expensive than it is.  But we still do a lot
>> of work to make the codes run faster.
>> 
>> Have Fun!
>> Reg
>
>
> _______________________________________________
> Help-glpk mailing list
> [email protected]
> https://lists.gnu.org/mailman/listinfo/help-glpk



_______________________________________________
Help-glpk mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/help-glpk

[Help-glpk] [Fwd: Re: Optimization and Multicore GLPK]

Reply via email to