Hi, I am really having problems believing that matrix copying is the major problem in an optimization algorithm. Copying is O(N^2) operations. Surely, for any problem where performance would matter, it is completely dwarfed by the O(N^3) complexity of actually solving the normal equation.
Also, I think testing should be done on an actual large problem where scaling issuing would show up. The 1000x2 jaccobian would results in a 2x2 normal equation. Surely this is not a good test case. Konstantin On Tue, Feb 25, 2014 at 8:35 AM, Evan Ward <evan.w...@nrl.navy.mil> wrote: > > On 02/24/2014 05:47 PM, Gilles wrote: > > On Mon, 24 Feb 2014 14:41:34 -0500, Evan Ward wrote: > >> On 02/24/2014 01:23 PM, Gilles wrote: > >>> On Mon, 24 Feb 2014 11:49:26 -0500, Evan Ward wrote: > >>>> One way to improve performance would be to provide pre-allocated space > >>>> for the Jacobian and reuse it for each evaluation. > >>> > >>> Do you have actual data to back this statement? > >> > >> I did some tests with the CircleVectorial problem from the test cases. > >> The Jacobian is 1000x2, and I ran it 1000 times until hotspot stopped > >> making it run faster. The first column is the current state of the code. > >> The second column is with one less matrix allocation each problem > >> evaluation. > >> > >> trunk -1 alloc %change > >> lu 0.90 s 0.74 -17% > >> chol 0.90 0.74 -17% > >> qr 0.87 0.70 -20% > >> > >> > >> I also see similar reductions in runtime using 1e6 observations and 3 > >> observations. We could save 3-4 allocations per evaluation, which > >> extrapolates to 60%-80% in run time. > > > > I would not have expected such a big impact. > > How did you set up the benchmark? Actually, it would be a sane starting > > point to create a "performance" unit test (see e.g. > > "FastMathTestPerformance" > > in package "o.a.c.m.util"). > > The test I used: https://gist.github.com/wardev/3f183dfaaf297dc2462c > > I also modified CircleVectorial to use the MultivariateJacobianFunction > to avoid the translation copies. > > > > >> > >>>> The > >>>> LeastSquaresProblem interface would then be: > >>>> > >>>> void evaluate(RealVector point, RealVector resultResiduals, RealVector > >>>> resultJacobian); > >>>> > >>>> I'm interested in hearing your ideas on other approaches to solve this > >>>> issue. Or even if this is an issue worth solving. > >>> > >>> Not before we can be sure that in-place modification (rather than > >>> reallocation) always provides a performance benefit. > >> > >> I would like to hear other ideas for improving the performance. > > > > Design-wise, it is quite ugly to modify input parameters. I think that it > > could also hurt in the long run, by preventing other ways to improve > > performance. > > I agree, it smells like fortran. ;) > > > > > Why couldn't the reuse vs reallocate be delegated to implementations of > > the "MultivariateJacobianFunction" interface? > > That is the change I made to produce the performance numbers. It works > well as long as the application has a single thread. Once it is > multi-threaded we would have to create a new Function for each thread. > I'm reluctant to do that because I think not copying the whole problem > for each thread is a nice feature. > > Really I think we need 1 matrix per optimization context. The question > is: How does the Function, which is several method calls below the > optimize() method, access the current optimization context? If we reuse > the matrix at the Function level then we are implicitly tying the > optimization context to the thread context. By passing the matrices as > parameters we make the context explicit, but we clutter up the API. > Perhaps there is a better solution we haven't thought of yet. > > > Eventualy, doesn't it boil down to creating "RealVector" and "RealMatrix" > > implementations that modify and return "this" rather than create a new > > object? > > Interesting idea. I think this will remove the allocation+copy when > applying weights. That just leaves where matrix decompositions create a > copy of the input matrix. > > Best Regards, > Evan > > > > > > > Best Regards, > > Gilles > > > >>> > >>>> Evan > >>> > >> > >> On 02/24/2014 12:09 PM, Luc Maisonobe wrote: > >>> Hi Evan, > >>> > >>> Le 24/02/2014 17:49, Evan Ward a écrit : > >>>> I've looked into improving performance further, but it seems any > >>>> further > >>>> improvements will need big API changes for memory management. > >>>> > >>>> Currently using Gauss-Newton with Cholesky (or LU) requires 4 matrix > >>>> allocations _each_ evaluation. The objective function initially > >>>> allocates the Jacobian matrix. Then the weights are applied through > >>>> matrix multiplication, allocating a new matrix. Computing the normal > >>>> equations allocates a new matrix to hold the result, and finally the > >>>> decomposition allocates it's own matrix as a copy. With QR there are 3 > >>>> matrix allocations each model function evaluation, since there is no > >>>> need to compute the normal equations, but the third allocation+copy is > >>>> larger. Some empirical sampling data I've collected with the jvisualvm > >>>> tool indicates that matrix allocation and copying takes 30% to 80% of > >>>> the execution time, depending on the dimension of the Jacobian. > >>>> > >>>> One way to improve performance would be to provide pre-allocated space > >>>> for the Jacobian and reuse it for each evaluation. The > >>>> LeastSquaresProblem interface would then be: > >>>> > >>>> void evaluate(RealVector point, RealVector resultResiduals, RealVector > >>>> resultJacobian); > >>>> > >>>> I'm interested in hearing your ideas on other approaches to solve this > >>>> issue. Or even if this is an issue worth solving. > >>> Yes, I think this issue is worth solving, especially since we are going > >>> to ship 3.3 and need to fix as much as possible before the release, > >>> thus > >>> avoiding future problems. Everything spotted now is worth fixing now. > >>> > >>> Your approach seems reasonable, as long as the work arrays are really > >>> allocated at the start of the optimization and shared only through a > >>> few > >>> documented methods like the one you propose. This would mean we can say > >>> in the javadoc that these area should be used only to fulfill the API > >>> requirements and not copied elsewhere, as they *will* be modified as > >>> the > >>> algorithm run, and are explicitly devoted to avoid reallocation. I > >>> guess > >>> this kind of problems is more important when lots of observations are > >>> performed, which correspond to very frequent use case (at least in the > >>> fields I know about). > >>> > >>> For the record, what you propose seems similar to what is done in the > >>> ODE package, as the state vector and its first derivatives are also > >>> kept > >>> in preallocated arrays which are reused throughout the integration and > >>> are used to exchange data between the Apache Commons Math algorithm and > >>> the user problem to be solved. So it is somehting we already do > >>> elsewhere. > >> > >> OK. We could keep the Evaluation interface, which would just reference > >> the pre-allocated residuals and matrix. If the result parameters are > >> null the LSP could allocate a matrix of the correct size automatically. > >> So then the interface would look like: > >> > >> Evaluation evaluate(RealVector point, RealVector resultResiduals, > >> RealVector resultJacobian); > >> > >>> > >>> best regards, > >>> Luc > >>> > >>>> Best Regards, > >>>> Evan > >>>> > >>>> > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >>>> For additional commands, e-mail: dev-h...@commons.apache.org > >>>> > >>>> > >>>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >>> For additional commands, e-mail: dev-h...@commons.apache.org > >>> > >> > >> > >> On 02/24/2014 01:16 PM, Gilles wrote: > >>> On Mon, 24 Feb 2014 18:09:26 +0100, Luc Maisonobe wrote: > >>>> Hi Evan, > >>>> > >>>> Le 24/02/2014 17:49, Evan Ward a écrit : > >>>>> I've looked into improving performance further, but it seems any > >>>>> further > >>>>> improvements will need big API changes for memory management. > >>>>> > >>>>> Currently using Gauss-Newton with Cholesky (or LU) requires 4 matrix > >>>>> allocations _each_ evaluation. The objective function initially > >>>>> allocates the Jacobian matrix. Then the weights are applied through > >>>>> matrix multiplication, allocating a new matrix. Computing the normal > >>>>> equations allocates a new matrix to hold the result, and finally the > >>>>> decomposition allocates it's own matrix as a copy. With QR there > >>>>> are 3 > >>>>> matrix allocations each model function evaluation, since there is no > >>>>> need to compute the normal equations, but the third > >>>>> allocation+copy is > >>>>> larger. Some empirical sampling data I've collected with the > >>>>> jvisualvm > >>>>> tool indicates that matrix allocation and copying takes 30% to 80% of > >>>>> the execution time, depending on the dimension of the Jacobian. > >>>>> > >>>>> One way to improve performance would be to provide pre-allocated > >>>>> space > >>>>> for the Jacobian and reuse it for each evaluation. The > >>>>> LeastSquaresProblem interface would then be: > >>>>> > >>>>> void evaluate(RealVector point, RealVector resultResiduals, > >>>>> RealVector > >>>>> resultJacobian); > >>>>> > >>>>> I'm interested in hearing your ideas on other approaches to solve > >>>>> this > >>>>> issue. Or even if this is an issue worth solving. > >>>> > >>>> Yes, I think this issue is worth solving, especially since we are > >>>> going > >>>> to ship 3.3 and need to fix as much as possible before the release, > >>>> thus > >>>> avoiding future problems. Everything spotted now is worth fixing now. > >>>> > >>>> Your approach seems reasonable, as long as the work arrays are really > >>>> allocated at the start of the optimization and shared only through > >>>> a few > >>>> documented methods like the one you propose. This would mean we can > >>>> say > >>>> in the javadoc that these area should be used only to fulfill the API > >>>> requirements and not copied elsewhere, as they *will* be modified > >>>> as the > >>>> algorithm run, and are explicitly devoted to avoid reallocation. I > >>>> guess > >>>> this kind of problems is more important when lots of observations are > >>>> performed, which correspond to very frequent use case (at least in the > >>>> fields I know about). > >>>> > >>>> For the record, what you propose seems similar to what is done in the > >>>> ODE package, as the state vector and its first derivatives are also > >>>> kept > >>>> in preallocated arrays which are reused throughout the integration and > >>>> are used to exchange data between the Apache Commons Math algorithm > >>>> and > >>>> the user problem to be solved. So it is somehting we already do > >>>> elsewhere. > >>> > >>> If I understand correctly what is being discussed, I do not agree with > >>> this approach. > >>> > >>> The optimization/fitting algorithms must use matrix abstractions. > >>> If performance improvements can achieved, they must happen at the level > >>> of the appropriate matrix implementations. > >>> > >> > >> The matrix abstractions will still be used in the interface. As far as I > >> can tell none of the optimizers or linear algebra classes use the matrix > >> abstractions internally. For example LU, QR, and Cholesky all copy the > >> matrix data to an internal double[][]. I tried computing the normal > >> equation in GaussNewton as j.transpose().multiply(j), but the > >> performance was bad because j.transpose() creates a copy of the matrix. > >> That's why we have the current ugly for loop implementation with > >> getEntry() and setEntry(). Maybe matrix "views" could help solve the > >> issue. > >> > >>> > >>> Best regards, > >>> Gilles > >>> > >>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >>> For additional commands, e-mail: dev-h...@commons.apache.org > >>> > >> > >> Regards, > >> Evan > >> > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >> For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >