Perhaps solving the dual is better. In general, if Y is an n x 1 vector (dependent variables), X is an n x p matrix (independent variables), b is an p x 1 vector of regression coefficients (the unknowns), the L1 regression problem is:
min ||Y - Xb||1 (the L1 norm of Y-Xb) This has dual of: max WY, s.t. WX = 0 and -1 <= Wi <= 1, where W are the unknown dual variables and is a 1 x n vector. The interior point method, when applied to this dual problem, is essentially performing a series of weighted least squares on the original problem. The dual values in this dual problem are the "b" in the original problem. I somewhat suspect that this is an easier and more stable problem for GLPK, because the Cholesky factorization is of a p x p matrix instead of an n x n matrix (p is usually much smaller than n). For the simple case in which p=2, with a constant and slope term (the cf12a.mod example), the GMPL code would be: /* Curve fitting problem 12.11(a) H P Williams "Model Building in Mathematical Programming" Dr. H J Mackenzie HARD software [email protected] 2006-01-05 modified to solve the dual: Dr. M. Meketon */ # set of points set I; # independent variable param x {i in I}; # dependent variable param y {i in I}; # output input values printf {i in I} "x = %.1f; y = %.1f\n", x[i], y[i]; # define equation variables var w{i in I}, >= -1, <= 1; # define objective function maximize dual_obj: sum {i in I} w[i]*y[i]; # define equation constraint s.t. constant_term : sum{i in I} w[i] = 0; s.t. slope_term: sum{i in I} w[i]*x[i] = 0; solve; # should be: y = 0.6375x + 0.5812 printf "y = %.4fx + %.4f\n", slope_term.dual, constant_term.dual; /* * * DATA section * */ data; param : I : x y := 1 0 1 2 0.5 0.9 3 1 0.7 4 1.5 1.5 5 1.9 2 6 2.5 2.4 7 3 3.2 8 3.5 2 9 4 2.7 10 4.5 3.5 11 5 1 12 5.5 4 13 6 3.6 14 6.6 2.7 15 7 5.7 16 7.6 4.6 17 8.5 6 18 9 6.8 19 10 7.3 ; end; -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Reginald Beardsley Sent: Tuesday, December 18, 2012 12:22 PM To: glpk Subject: [Help-glpk] Interior point failure I've been fooling around w/ L1 fits to a line w/ random errors in X & Y to collect performance profiling data. The model is cf12a.mod w/ the data generated by an awk script. At slightly over 700 points, I find that the interior point method fails to converge sometimes. The number of failures in a run of 10 instances w/ 714 points varies from run to run. This is the case whether the data points vary from run to run or if the points are the same but the order in which they are supplied varies. command is: glpsol --interior -m tst.mod -d tst.dat can anyone point me to an explanation of why? I'm guessing this is a result of floating point arithmetic, but would like to better understand it. Thanks, Reg _______________________________________________ Help-glpk mailing list [email protected] https://lists.gnu.org/mailman/listinfo/help-glpk This e-mail and any attachments may be confidential or legally privileged. If you received this message in error or are not the intended recipient, you should destroy the e-mail message and any attachments or copies, and you are prohibited from retaining, distributing, disclosing or using any information contained herein. Please inform us of the erroneous delivery by return e-mail. Thank you for your cooperation. _______________________________________________ Help-glpk mailing list [email protected] https://lists.gnu.org/mailman/listinfo/help-glpk
