Hello, >> Slightly off-topic, but while we are thinking about this code ... >> >> A while back I did some crude profiling of the v.lidar tools and found >> that it was spending lots and lots of time in the Tcholetsky decomposition >> loop (3-for loops deep). >> > > That's better now because the matrix is a bit smaller.
Yes, its a nice symmetric band matrix. Was a bit puzzling for me to understand the creation of the matrix without any documentation! So, i have implemented some conversion functions into the gmath library while moving the tchol* code from lidar lib into the gmath lib: now transformation of qudratic <-> band <-> sparse matrices can be performed. A new CG band solver, the matrix type conversion and a new band matrix -vector multiplication function are available with tests in the svn-trunk of grass 7. >> >> It seemed like a good & simple test case for using OpenMP to multi-thread >> it, but I got stuck with it segfaulting. AFAIR the problem was that OpenMP >> wanted you to malloc the entire thing before starting, and this could >> get big. >> >> if it interests you, see the attached patch and >> https://trac.osgeo.org/grass/ticket/657 >> http://lists.osgeo.org/pipermail/grass-dev/2009-June/044709.html >> http://lists.osgeo.org/pipermail/grass-dev/2009-June/044705.html >> http://grass.osgeo.org/wiki/OpenMP >> > > Hmm, if I understand the code right, only the innermost for loop can be > executed in parallel because the first two for-loops need results of > previous runs (not possible to run i + 1 before i, same for j). But I don't > know that parallel stuff, I would in any case first optimize the code > without parallelization, only then add multithreading. I fully agree. The band cholesky solver is well suited for the job but not designed for parallelization. Thus i have implemented an iterative Conjugate Gradient (CG) solver for band matrices (just by replacing the matrix - vector multiplication) to replace the cholesky band solver. Well, the cholesky solver out-performes the CG solver by a factor of 10+. I have partly parallelized the CG solver for band matrices, but even the speedup on a 8 core system is to small to compete with the cholesky band solver. Besides of that, the created band matrices are of bad condition, so the CG solver may fail for subregions. Hence, i would suggest to use MPI parallelization or something else on process base, to concurrently compute the subregions which are created by default. Maybe a python script using overlapping tiling and subprocess can do the job? Best regards Soeren _______________________________________________ grass-dev mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/grass-dev
