-------- Forwarded Message -------- From: Matteo Fischetti DEI <[email protected]> Reply-to: [email protected] To: Andrew Makhorin <[email protected]> Subject: randomization in MIPs Date: Mon, 17 Dec 2012 11:28:11 +0100
Hi Andrew. I have a paper on the role of randomization at http://www.dei.unipd.it/~fisch/papers/exploiting_erraticism_in_search.pdf see Table 1 where a multi-thread experiment is simulated by running K copies of the same MIP solver in parallel, with absolutely no communication, by just randomizing the runs. I recently had a provocative talk about this (well know) behavior, see Mike Trick's blog at http://mat.tepper.cmu.edu/blog/?p=1695 Below I copied a recent mail of mine to the cbc-list, where a same instance is solved by cbc in 79 or 1816 sec.s, depending on the random seed used. In my view, a quick-and-dirty way to have a multithread GLPK version is: 1. introduce a random seed that can be set at the beginning of the run, and let some decisions in the MIP solver be affected by a random value (see below) 2. run K independent copies of glpk, in parallel, with different seeds, and abort the execution of all threads as soon as one thread finishes its execution In case one thread runs into numerical troubles, it can automatically self-replicate itself by running another thread that solves the same MIP, from scratch but with a different seed, and abort. More elaborated strategies can of course be implemented, e.g. collect the best incumbent among all thread (so the best heuristic is always available for the user), and possibly share it among threads. As to point 1 above, randomization can be made a. by reshuffling the internal order of the vars. (i.e., as reported in the MIPLIB2010 paper, just randomly permuting columns can act as a significant perturbation). and also b. by randomize the choices in case of (almost) ties e.g. in the heuristics and when selecting the branching variable c. when the LP is perturbed because of degeneracy c. right after the first LP relaxation at the root node, my making a small n. of random pivots on zero-reduced cost columns just to sample a different LP solution on the optimal face (see my paper above for details). Note that randomizing the very first optimal LP solution is a good idea as it affects all the subsequent choices (cuts, heuristics, branching) If you want to make a very quick shot and see how randomization can speedup GLPK in a K-thread enviornment (K=1,2,5,10,20), you can run K times GLPK on a same set of (hard) instances after having randomly permuted the input columns according to option a. above, and for each instance collect the smallest computing time out of the K runs. I recently convinced Hans Mittleman to do this experiment with Cplex 12.5, see http://plato.asu.edu/ftp/path.html Keep in touch --M-- -------- Messaggio originale -------- Oggetto: Re: [Cbc] muti-thread compile for cbc-2.7.8 Data: Fri, 30 Nov 2012 17:01:59 +0100 Mittente: Matteo Fischetti DEI <[email protected]> Rispondi-a: [email protected] A: Noli Sicad <[email protected]> CC: cbc <[email protected]>, Hi Noli. Let me summarize the situation. 1. Proximity search is only available in CBC\trunk; in interactive mode, it is activated by -proximity on. 2. On stable CBC 2.7.8 the proximity flag is accepted but not used (yet) 3. In interactive mode, you can check the LOG to see the effect of the proximity heuristic, e.g. > cbc ..\seymour.lp -proximity on -solve Welcome to the CBC MILP Solver Version: Trunk (unstable) Build Date: Nov 29 2012 ... Cbc0012I Integer solution of 428 found by feasibility pump after 0 iterations and 0 nodes (16.67 seconds) Cbc0012I Integer solution of 427 found by RINS after 0 iterations and 0 nodes (19.44 seconds) ... Cbc0012I Integer solution of 426 found by Proximity Search after 0 iterations and 0 nodes (37.95 seconds) ... 4. Proximity search is a refinement heuristic, hence it is NOT applied when no solution is found at the root 5. Now I come to your important question below: >>> However, I could not understand why 2.7.8 runs faster than the trunk. The trunk solved the same MIP double the time (i.e. 2x longer) than using 2.7.8 with or without the proximity search. It seems that proximity search does not matter if you run threads (i.e. threads 8). My explanation is that you are just facing "erraticism" here, meaning that small changes of the initial conditions (including using a different architecture/compiler/n. of threads) can randomly change LP sol.s --> branching --> search path, sometimes allowing you to converge much earlier. I recently had a provocative talk about this (well know) behavior, see Mike Trick's blog at http://mat.tepper.cmu.edu/blog/?p=1695 Coming to your specific instance (TimberHarvest_0010p.lp), tonight I ran 99 times cbc\trunk (1-thread) with the command > cbc ..\TimberHarvest_0010p.lp -randomCbcSeed 0 -randomSeed 0 -solve where -randomCbcSeed and -randomSeed allow you to change the internal CLP and CBC random seeds (0 uses clock as seed). I posted at www.dei.unipd.it/~fisch/rand.txt the LOGs; here is an extract: run CPU sec.s nodes -------------------------------- #001 422.80 688,438 #002 120.42 541,641 #003 289.89 980,125 #004 442.28 472,552 #005 599.50 694,749 #006 638.95 1,192,398 #007 220.62 464,118 #008 446.78 581,108 #009 231.38 627,515 #010 275.19 714,783 ... #069 79.25 262,105 ... #099 1816.64 1,998,075 So, depending on the random seeds, your instance can be solved in 79 or 1816 sec.s (!!) 6. Of course, not all instances behave the same, but you better know that sometimes erraticism can play a relevant role, e.g., when you make parameter tuning--or when you try to explain why a certain idea (out of 100 you tried) is working magically well... 7. In my view, erraticism is an opportunity rather than an issue--e.g., by running multiple times the same solver with different random seeds (possibly in parallel), you can hopefully find better heuristic solutions at the root node. Here is an example: running 5 times > cbc ..\seymour.lp -randomCbcSeed 0 -randomSeed 0 -solve produces the following root-node heuristic solutions: run #1: Objective value: 429.00000000 run #2: Objective value: 428.00000000 run #3: Objective value: 426.00000000 run #4 :Objective value: 427.00000000 run #5: Objective value: 427.00000000 Analogously, running > cbc ..\seymour.lp -randomCbcSeed 0 -randomSeed 0 -proximity on -passF 0 -solve produced (still at the root): run #1: Objective value: 426.00000000 run #2: Objective value: 425.00000000 run #3: Objective value: 425.00000000 run #4: Objective value: 424.00000000 run #5: Objective value: 423.00000000 <- optimal value 7. Any comment on the subject is welcome. Best --Matteo-- -- Prof. Matteo Fischetti DEI, University of Padova via Gradenigo 6/A I-35131 Padova (Italy) e-mail: [email protected] web: www.dei.unipd.it/~fisch reports: www.dei.unipd.it/~fisch/papers -- Prof. Matteo Fischetti DEI, University of Padova via Gradenigo 6/A I-35131 Padova (Italy) e-mail: [email protected] web: www.dei.unipd.it/~fisch reports: www.dei.unipd.it/~fisch/papers _______________________________________________ Help-glpk mailing list [email protected] https://lists.gnu.org/mailman/listinfo/help-glpk
