First, a big thanks to Richard Lincoln and Mirko Todorovski for their valuable 
contributions since we put MATPOWER on GitHub. Richard contributed the Travis 
CI integration <https://travis-ci.org/MATPOWER/matpower/> that now 
automatically runs the full test suite on every commit and pull request 
(supplying a little green check mark when everything passes). And Mirko 
contributed some distribution power flow algorithms and cases.

Second, in response to the very informative post to MATPOWER-L back in Dec by 
Jose Luis Marin (see below), I made some modifications to the way the update 
step is computed in the Newton-Raphson power flow, resulting in significant 
speed improvements (up to 2x) for large models under Matlab (marginally better 
under Octave).

You can see the details and comment in this pull request ...

    https://github.com/MATPOWER/matpower/pull/7

… or by checking out the lu branch 
<https://github.com/MATPOWER/matpower/tree/lu> of the Git repository. Jose, I’m 
particularly interested in any further comments you might have. In particular, 
it appears that the vast majority of the speedup comes from the AMD reordering 
and not from the Gilbert-Peierls algorithm.

    Ray





> Begin forwarded message:
> 
> From: Jose Luis Marín <mari...@aia.es>
> Subject: Re: RunPF vs GPU calculations
> Date: December 1, 2016 at 12:52:05 PM EST
> To: MATPOWER discussion forum <matpowe...@list.cornell.edu>
> Reply-To: MATPOWER discussion forum <matpowe...@list.cornell.edu>
> 
> Hello all,
> 
> I've been meaning to report the following findings to Ray and the MATPOWER 
> team for some time, but never got around doing it.  Well, here it is in 
> summarized form:
> There's a very cheap way to speedup the J \ F step (actually, the sparse LU 
> decomposition) by factors of about x1.3 to x1.8, and maybe higher for bigger 
> cases.  It just involves invoking the MATLAB solver that is best suited for 
> power system matrices (more below). 
> GPU computing, at least in its current state, is not a panacea for the 
> powerflow problem, because the kind of sparse linear algebra that's needed in 
> powerflow problems just doesn't offer enough opportunities to exploit GPU 
> architectures. 
> 
> Try this simple change in newtonpf.m:
> 
> marinjl@xwing:~/Library/matpower6.0b2$ diff newtonpf.m.ORIG newtonpf.m
> 89c89,93
> <     dx = -(J \ F);
> ---
> >     %dx = -(J \ F);
> >     % JLM: For performance, factorize using MATLAB's Gilbert-Peierls + AMD 
> > (see T. Davis)
> >     Q = sparse(amd(J), 1:size(J,1), 1);  % AMD reordering
> >     [L, U, P] = lu(Q'*J*Q, 1.0);  % this form ensures that G-P is called. 
> > Last arg: 0==no pivoting, 1.0==partial pivoting.
> >     dx = - Q * ( U \ ( L \ (P * Q' * F)));
> 
> 
> My benchmarks (3GHz Intel Core 2 Duo) yield the following speedups:
> 
> %                    UMFPACK        G-P        speedup
> % ======================================================
> % case300:          1.689e-02    1.230e-02      1.37
> % case1354pegase:   5.848e-02    3.652e-02      1.60
> % case1888rte:      2.792e-02    2.130e-02      1.31
> % case1951rte:      3.037e-02    2.169e-02      1.40
> % case2383wp:       1.429e-01    8.711e-02      1.64
> % case2736sp:       1.327e-01    7.512e-02      1.76
> % case2737sop:      1.632e-01    9.074e-02      1.79
> % case2746wop:      1.589e-01    8.725e-02      1.82
> % case2746wp:       1.448e-01    8.636e-02      1.67
> % case2848rte:      4.652e-02    3.301e-02      1.40
> % case2868rte:      1.681e-01    9.661e-02      1.74
> % case2869pegase:   2.402e-01    1.335e-01      1.79
> % case3012wp:       1.125e-01    6.733e-02      1.67
> % case3120sp:       2.155e-01    1.228e-01      1.75
> % case3375wp:       1.598e-01    9.565e-02      1.67
> % case6468rte:      2.155e-01    1.443e-01      1.49
> % case6470rte:      2.316e-01    1.434e-01      1.61
> % case6495rte:      2.118e-01    1.433e-01      1.47
> % case6515rte:      2.244e-01    1.443e-01      1.55
> % case9241pegase:   1.185e+00    6.392e-01      1.85
> % case13659pegase:  1.484e+00    7.991e-01      1.85
> %
> %
> % max speedup: 1.85
> % min speedup: 1.31
> 
> (note: I used pf.tol=1.0e-10 in order to trigger a few more iterations. I ran 
> each case 10 times, and kept the best timing) 
> 
> 
> Longer explanation:
> The Jacobian matrices that one typically obtains in power networks are very, 
> very sparse.  They are very different than sparse matrices obtained in other 
> areas (such as, e.g., finite-element modeling), where the matrix still has 
> fairly dense blocks here and there (or at least it can be re-arranged to have 
> them).  Power system matrices, as well as many electronic circuit matrices, 
> have dense blocks of sizes 3x3 or 4x4 at most, which is rather small.  Now, 
> the vast majority of research going into sparse LU solvers focuses on the 
> former kind of matrices, i.e. those with large dense blocks.  The methods go 
> by names like "supernodal" and "multifrontal", and basically consist in 
> designing the algorithm so that it can exploit the BLAS for processing those 
> dense blocks.  The point is that power system matrices are ill-suited to 
> exploit the power of the BLAS, and therefore those methods are actually 
> slower than the more straight-forward sparse solvers that don't try to use 
> the BLAS at all.
> 
> MATLAB contains sparse solvers of both types, but the problem is that it's 
> not clearly documented how the backslash operator (or the lu call) uses 
> internally one algorithm or the other.  It turns out that the way you call 
> lu() influences the algorithm being used!!!  The default is UMFPACK, but if 
> you use the "three-output, two-input args" call as above then you'll be 
> invoking a Gilbert-Peierls method.  As you may have guessed by now, UMFPACK 
> is one of those methods using the BLAS, while G-P is not.  As shown in the 
> benchmarks above, UMFPACK clearly has an overhead.
> 
> Coming back to the issue of GPUs, one can see why the potential performance 
> gains for this type of sparse linear algebra are not as great as they seem. 
> 
> If you are interested in more details, I can provide references.  See for 
> instance SuiteSparse by Tim Davies, which in terms of sparse solvers is 
> probably the best in class now  (some of it is already integrated within 
> MATLAB, btw).
> 
> -- 
> Jose L. Marin
> Grupo AIA

Reply via email to