-------- Forwarded Message -------- From: Vladimir Dergachev <[email protected]> To: glpk xypron <[email protected]> Cc: Reginald Beardsley <[email protected]>, [email protected] Subject: Re: [Help-glpk] Optimization and Multicore GLPK Date: Tue, 25 Dec 2012 01:03:13 -0800 (PST)
On Tue, 25 Dec 2012, glpk xypron wrote: > Hello Reg, > > == Profiling == > > Profiling GLPK could give very valuable ideas how to speed up the code. > > Oprofile is a useful tool for doing the work. See > http://oprofile.sourceforge.net There is also "perf" which newer kernels support. > > == Parallelization == > > Before thinking about parellization it would be necessary to attack the very > same design decisions that make GLPK not thread safe: memory management and > error handling. In my experience, for compute-intensive codes, it is not necessary to make memory management to be too nice to multithreaded code - a simple global lock will suffice. This is because a well optimized compute intensive code rarely (if ever) calls memory management functions, as they are rather slow. For quick snippets of memory, one can use alloca(). > > When it comes to question whether parallization or optimiation of the single > thread code is more valuable I must admit that I do not care as long as I can > get back to the entry prompt faster. So if parallelization can give me factor > 4 and a better algorithm factor 25, I would be happy to get both and solve > my problem in 1 % of the time. > There is also a possibility of using SSE or newer AVX to achieve much higher throughput. Also, upcoming Xeon PHI looks very attractive for high-throughput computation. best Vladimir Dergachev > There are different levels where parallelization might be considered for GLPK. > > One place is the simplex algorithm itself. An overview can be found in > http://www.maths.ed.ac.uk/hall/ParSimplexRevised/ParSimplexApril2007.pdf > One direction of research not covered here are attempts to run the > Simplex algorithm on GPUs. > > Another place for parallelization is the branch and bound algorithm as > described in > http://web.ist.utl.pt/~ist11038/compute/_parallel_comp/Montana2007ParalMILP.pdf > http://www.cc.gatech.edu/~bader/papers/ParallelBranchBound.pdf > > My impression is that parallelization of branch and bound in GLPK would > require > significantly less programming resources. > > Best regards > > Heinrich Schuchardt > > -------- Original-Nachricht -------- >> Datum: Mon, 24 Dec 2012 17:37:22 -0800 (PST) >> Betreff: [Help-glpk] Optimization and Multicore GLPK > >> Here are a few comments on optimization and multicore GLPK. A search >> online for information about optimizing GLPK only turned up xypron's recent >> post about enabling SSE and fastmath. However, GLPK is very sophisticated, >> so I what I have to say may fall in the "not documented yet" category. >> However, except for Maros, I've seen little mention of performance profiling >> and coverage analysis in the LP literature I've read. Most of the >> performance metrics are limited to iteration counts and elapsed time. >> Serious >> code optimization requires much finer granularity of analysis. >> >> I have just completed an initial, very quick read of "Computational >> Techniques of the Simplex Method" by Istvan Maros. For anyone interested in >> working on LP codes, I think it is a necessary, but not sufficient reference. >> Maros gives a good overview of many issues related to implementing a >> simplex code and to large scale solvers in general. >> >> It was typeset by the author so there are typos and non-native speaker >> errors to contend with. However, all the important ideas are accompanied by >> numerous citations, so with the addition of the appropriate references, it >> appears to provide a good map of the terrain. I qualify my statement only >> because I am far too ignorant of LP to make a stronger statement. >> >> I agree entirely with Andrew's comments previously on the topic of >> parallelization of GLPK. There is much work to be done on the serial >> algorithms >> before it makes any sense to attempt to implement parallel execution. Aside >> from the computational cost of pricing, the particular choice of pricing >> rule can strongly influence solution times for particular problems. So >> implementing Devex and other rules seems to me a good starting point. I'm >> sure that other opportunities to improve single core performance exist, but >> this seems a particularly obvious place to start. >> >> That said, barrier synchronized shared memory parallel programming should >> be suitable for use in GLPK running on multicore workstations. Actual >> performance is highly dependent upon cache architecture and organization, so >> the >> benefit cannot be predicted easily. Considerable experimentation and >> deep knowledge of the hardware behavior is required to make this work well. >> The only reason it is worth considering at all is that a single >> architecture dominates the computing landscape. Intel is unlikely to do >> something >> that breaks a lot of major codes and core counts are likely to continue to >> grow. >> >> The concept is simple. In sections which are not execution order >> dependent, a small number of coroutines are started by a signal. These >> coroutines then run to completion at which time they send a signal to the >> main >> routine which decrements a counter. When all the coroutines have completed, >> the counter decrements to zero and the main routine continues. If one can >> avoid thrashing due to cache collisions, those portions of the code can be >> speeded up by the number of cores employed. Of course, total speedup will >> be much less as dictated by Amdahl's law. >> >> In looking at some of the source code for simplex, I see places where >> there is a loop over vector operations. The natural thing to do for these is >> to >> use the SSE instructions to speed up the vector operations and spread the >> vectors over multiple cores. >> >> Though conceptually simple, none of this is easy to do. In addition, >> GLPK is easily the most complex code I've ever looked at in 20+ years of >> working on several million lines of scientific codes. Fortunately, it's also >> hands down the best written large code I've ever seen which is a real >> delight. >> >> Any optimization work on a code as sophisticated as GLPK is a major >> undertaking which will take a long time to execute. The first step is >> profiling >> and coverage analysis. >> >> If there is sufficient interest in this subject, I propose to implement >> and document performance profiling and coverage analysis of GLPK in the >> wikibook using the various Netlib problem sets. This will be for convenience >> restricted to the *nix environment. However, it should generally be >> applicable to Windows if a *nix environment package such as Cygwin is used. >> >> I am particularly interested in comments from Andrew, Marc, Robbie and >> xypron. All have been very generous to me in different ways and this is an >> attempt to repay them. I come from a seismic processing background where run >> times are measured in tens of thousands of hours for a single dataset. >> Fortunately, most seismic codes are trivially parallel. So a few hundred quad >> core nodes and a couple of weeks will get the job done. Were that not the >> case, oil would be a lot more expensive than it is. But we still do a lot >> of work to make the codes run faster. >> >> Have Fun! >> Reg > > > _______________________________________________ > Help-glpk mailing list > [email protected] > https://lists.gnu.org/mailman/listinfo/help-glpk _______________________________________________ Help-glpk mailing list [email protected] https://lists.gnu.org/mailman/listinfo/help-glpk
