Hi Jian, Thanks for sharing these performance testing coding. As Chuck also demonstrated, measuring performance is critical for real progress.
Thanks, Matt On Thu, Mar 12, 2015 at 11:39 AM, Jian Cheng <jian.cheng.1...@gmail.com> wrote: > Hi, > > For (2), the attachment is a comparison I made. > It adds additional tests on vnl to the code from the previous link > http://nghiaho.com/?p=1726 . > Some header files can be downloaded from > https://github.com/DiffusionMRITool/dmritool/tree/master/Modules/HelperFunctions/include > > Then you can build and run > // With eigen > // g++ -DTEST_EIGEN test_matrix_pseudoinverse.cpp -o > test_matrix_pseudoinverse_vnl -lopencv_core -O3 -DNDEBUG > > // With ARMA OpenBLAS > // g++ -DTEST_ARMA test_matrix_pseudoinverse.cpp -o > test_matrix_pseudoinverse -lopencv_core -larmadillo -lgomp -fopenmp > -lopenblas -O3 -DNDEBUG -DHAVE_INLINE > > // with vnl > // g++ -DTEST_VNL test_matrix_pseudoinverse.cpp -o > test_matrix_pseudoinverse -lopencv_core -lvnl -lvnl_algo > -I/usr/include/vxl/core -I/usr/include/vxl/vcl -O3 -DNDEBUG > -DUTL_USE_FASTLAPACK > > // with vnl + openblas > // g++ -DTEST_VNL_BLAS test_matrix_pseudoinverse.cpp -o > test_matrix_pseudoinverse -lopencv_core -lopenblas -lvnl -lvnl_algo > -I/usr/include/vxl/core -I/usr/include/vxl/vcl -DNDEBUG -O3 > -DUTL_USE_FASTLAPACK > // with vnl + mkl > // g++ -DTEST_VNL_BLAS test_matrix_pseudoinverse.cpp -o > test_matrix_pseudoinverse -lopencv_core -lmkl_intel_lp64 > -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -lvnl -lvnl_algo > -I/usr/include/vxl/core -I/usr/include/vxl/vcl -O3 -DNDEBUG > -DUTL_USE_FASTLAPACK > > // with utl+openblas > // g++ -DTEST_UTL test_matrix_pseudoinverse.cpp -o > test_matrix_pseudoinverse -lopencv_core -lopenblas -lvnl -lvnl_algo > -I/usr/include/vxl/core -I/usr/include/vxl/vcl -DNDEBUG -O3 > -DUTL_USE_FASTLAPACK > // with utl + mkl; > // g++ -DTEST_UTL test_matrix_pseudoinverse.cpp -o > test_matrix_pseudoinverse -lopencv_core -lmkl_intel_lp64 > -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -lvnl -lvnl_algo > -I/usr/include/vxl/core -I/usr/include/vxl/vcl -DNDEBUG -O3 > -DUTL_USE_FASTLAPACK > > In my experiments, using blas functions from mkl is the most efficient way. > > best, > Jian Cheng > > On 03/12/2015 10:41 AM, Matt McCormick wrote: >> Hi, >> >> From the discussion so far, it appears the following serious of steps >> could be taken to move forward on performance: >> >> 1) Replace vnl_vector and vnl_matrix with vnl_vector_fixed and >> vnl_matrix_fixed when possible. >> >> 2) Add Jian Cheng's BLAS and LAPACK backends for vnl_vector and vnl_matrix. >> >> 3) Add support for armadillo or eigen. >> >> 1) and 2) will be relatively easy to make, and will hopefully have an >> immediate impact on performance. 3) will take make work to happen, >> and it will take longer to impact the toolkit. We will need to have >> cross-platform builds of the libraries in the repository. Also, many >> ITK classes encapsulate their use of VNL very poorly, so it will not >> be as simple as swapping out or improving their backends. >> >> 2 cents, >> Matt >> >> On Thu, Mar 12, 2015 at 10:20 AM, Bradley Lowekamp >> <blowek...@mail.nih.gov> wrote: >>> Chuck, >>> >>> Thank you for giving us that important conclusion, under a quite difficult >>> situations. >>> >>> I wonder if there is any distinction in the usage of vnl_matrix vs >>> vnl_matrix_fixed. I would expect that operations done for pixel transforms >>> should have there dimension know at run-time and should be able to use the >>> vnl_matrix_fixed. >>> >>> I also have considered methods to transform a whole array of points at a >>> time. I wonder if for 3x3*3*256 sized operations ( scan-line size ) if there >>> would be benefit with the library based operations. >>> >>> Brad >>> >>> >>> On Mar 12, 2015, at 10:02 AM, Chuck Atkins <chuck.atk...@kitware.com> wrote: >>> >>> I worked with Julie Langou, maintainer of LAPACK, on this project a few >>> years ago. The funding situation ended up very strange and messy and we >>> basically had to cram 3 months worth of effort into 3 weeks, so needless to >>> say, we were not able to really achieve our goals. However, we spent a fair >>> amount of time profiling ITK and analyzing it's hot spots from vnl to >>> determine where to best spend the small ammount of time we had. The results >>> were not as straight forward as we expected. It turns out that most of the >>> use for vnl_matrix and vnl_vector was actually for an enourmous number of >>> operations on very small sized vectors and matricies (dimensions of 2, 3, or >>> 4), often for coordinate and geometry calculations or for small per-pixel >>> operations that were not easily vectorized in the implementation at the >>> time. In these cases, the overhead of calling out to a BLAS or LAPACK >>> library was much too expensive and the existing use of VNL was far more >>> optimal. This falls apart, however when trying to use vnl for more complex >>> algorithms since the larger matrix operations will be where the benefit can >>> be seen. So just re-implementing the vnl vector and matrix classes and >>> operators with underlying BLAS and LAPACK routines turned out to not be the >>> best solution for ITK as a whole. >>> >>> - Chuck >>> tage of the performance gains of large block matrix and vector >>> operations seen with optimized BLAS and LAPACK libraries, the >>> computations needed to be re-worked to act in an SoA (struct of >>> arrays) fashion instead. Given our limited time and resources, this >>> was out of scope for what we could tackle. >>> >>> * Typically AoS and SoA refer to storage layout but I'm using it to >>> refer to computation layout. The terminology may not be correct but >>> I think you can understand what I mean. >>> On Thu, Mar 12, 2015 at 8:32 AM, Bradley Lowekamp <blowek...@mail.nih.gov> >>> wrote: >>>> Hello, >>>> >>>> If I was writing my own ITK classes, and needed a fast matrix library I >>>> would likely pursue an additional dependency on an efficient numeric >>>> library >>>> for that project, such as eigen. >>>> >>>> However for the broad appeal of ITK I would think a flexible back end >>>> would be best. As I think there are a variety of BLAS and LAPACK libraries >>>> available ( commercial, open source, vender free ). It would be nice to >>>> pick >>>> one what has been optimized for the current architecture. I would think it >>>> would be most flexible to use this interface in the back end of a chosen >>>> numeric interface ( currently VNL ). Unfortunately, I don't have as much >>>> experience with these libraries as I'd like. >>>> >>>> Brad >>>> >>>> On Mar 12, 2015, at 5:15 AM, m.star...@lumc.nl wrote: >>>> >>>>> Hi, >>>>> >>>>> I think the eigen library is a mature and very fast library for these >>>>> kind of things: >>>>> http://eigen.tuxfamily.org/index.php?title=Main_Page >>>>> >>>>> You may want to check it out, to see if it offers what you need. >>>>> >>>>> It would be great to be able to use these within the itk. >>>>> >>>>> 2c >>>>> Marius >>>>> >>>>> -----Original Message----- >>>>> From: Insight-developers [mailto:insight-developers-boun...@itk.org] On >>>>> Behalf Of Jian Cheng >>>>> Sent: Wednesday, March 11, 2015 23:17 >>>>> To: Matt McCormick >>>>> Cc: Chuck Atkins; ITK >>>>> Subject: Re: [ITK-dev] efficiency of vnl_matrix >>>>> >>>>> Hi Matt, >>>>> >>>>> Thanks for your help, and also for the ITK workshop in UNC last time. >>>>> >>>>> It is very unfortunate. The efficiency of these numerical math operators >>>>> are very important for many applications. >>>>> >>>>> I recently released an ITK based toolbox, called dmritool, for diffusion >>>>> MRI data processing. >>>>> It has some files to add some supports of blas, lapack, mkl to >>>>> vnl_matrix and vnl_vector. >>>>> >>>>> http://diffusionmritool.github.io/dmritool_doxygen/utlBlas_8h_source.html >>>>> >>>>> http://diffusionmritool.github.io/dmritool_doxygen/utlVNLBlas_8h_source.html >>>>> >>>>> Those functions are not internally for vnl_matrix class. They are >>>>> operators for the data pointer stored in vnl_matrix object. >>>>> Thus, later I made a N-dimensional array library which internally >>>>> includes those functions, and also supports expression template to avoid >>>>> temporary copies. >>>>> >>>>> http://diffusionmritool.github.io/dmritool_doxygen/utlMatrix_8h_source.html >>>>> >>>>> http://diffusionmritool.github.io/dmritool_doxygen/utlVector_8h_source.html >>>>> >>>>> The efficiency comparison between vnl_vector/vnl_matrix and the >>>>> vector/matrix using openblas, lapack, or mkl can be found by running those >>>>> two tests >>>>> https://github.com/DiffusionMRITool/dmritool/blob/master/Modules/HelperFunctions/test/utlVNLBlasGTest.cxx >>>>> >>>>> https://github.com/DiffusionMRITool/dmritool/blob/master/Modules/HelperFunctions/test/utlVNLLapackGTest.cxx >>>>> >>>>> Maybe some codes can be used as patches in somewhere in ITK. I am not >>>>> sure. Maybe we need more discussion on it. >>>>> With your help and discussion, I will be very glad to make my first >>>>> patch to ITK. >>>>> Thanks. >>>>> >>>>> best, >>>>> Jian Cheng >>>>> >>>>> >>>>> On 03/11/2015 04:39 PM, Matt McCormick wrote: >>>>>> Hi Jian, >>>>>> >>>>>> Yes, it would be wonderful to improve the efficiency of these basic >>>>>> numerical operations. >>>>>> >>>>>> Funding for the Refactor Numerical Libraries has currently ended, and >>>>>> the effort is currently frozen. However, you are more than welcome to >>>>>> pick it up and we can help you get it into ITK. More information on >>>>>> the patch submission process can be found here [1] and in the ITK >>>>>> Software Guide. >>>>>> >>>>>> Thanks, >>>>>> Matt >>>>>> >>>>>> [1] >>>>>> https://insightsoftwareconsortium.github.io/ITKBarCamp-doc/CommunitySo >>>>>> ftwareProcess/SubmitAPatchToGerrit/index.html >>>>>> >>>>>> On Wed, Mar 11, 2015 at 4:07 PM, Jian Cheng <jian.cheng.1...@gmail.com> >>>>>> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> My task using ITK has intensive matrix-matrix product, pseudo-inverse, >>>>>>> etc. >>>>>>> Thus the performance is actually mainly determined by the matrix >>>>>>> library I used. >>>>>>> Firstly I use vnl_matrix and vnl_vector in ITK. Then I found it is >>>>>>> very inefficient because vnl matrix lib does not use blas and lapack. >>>>>>> After I wrote my own matrix class which uses openblas and lapack, I >>>>>>> got a hug gain of performance. >>>>>>> >>>>>>> I found there is a proposal to improve the efficiency of numerical >>>>>>> libraries in ITK. >>>>>>> http://www.itk.org/Wiki/ITK/Release_4/Refactor_Numerical_Libraries >>>>>>> I am not sure how is the progress of the proposal. >>>>>>> I wonder when the vnl matrix lib can internally support blas and >>>>>>> lapack, or mkl, so that we can just use it without lose of the >>>>>>> efficiency. >>>>>>> Thanks. >>>>>>> >>>>>>> best, >>>>>>> Jian Cheng >>>>>>> _______________________________________________ >>>>>>> Powered by www.kitware.com >>>>>>> >>>>>>> Visit other Kitware open-source projects at >>>>>>> http://www.kitware.com/opensource/opensource.html >>>>>>> >>>>>>> Kitware offers ITK Training Courses, for more information visit: >>>>>>> http://kitware.com/products/protraining.php >>>>>>> >>>>>>> Please keep messages on-topic and check the ITK FAQ at: >>>>>>> http://www.itk.org/Wiki/ITK_FAQ >>>>>>> >>>>>>> Follow this link to subscribe/unsubscribe: >>>>>>> http://public.kitware.com/mailman/listinfo/insight-developers >>>>> _______________________________________________ >>>>> Powered by www.kitware.com >>>>> >>>>> Visit other Kitware open-source projects at >>>>> http://www.kitware.com/opensource/opensource.html >>>>> >>>>> Kitware offers ITK Training Courses, for more information visit: >>>>> http://kitware.com/products/protraining.php >>>>> >>>>> Please keep messages on-topic and check the ITK FAQ at: >>>>> http://www.itk.org/Wiki/ITK_FAQ >>>>> >>>>> Follow this link to subscribe/unsubscribe: >>>>> http://public.kitware.com/mailman/listinfo/insight-developers >>>>> _______________________________________________ >>>>> Powered by www.kitware.com >>>>> >>>>> Visit other Kitware open-source projects at >>>>> http://www.kitware.com/opensource/opensource.html >>>>> >>>>> Kitware offers ITK Training Courses, for more information visit: >>>>> http://kitware.com/products/protraining.php >>>>> >>>>> Please keep messages on-topic and check the ITK FAQ at: >>>>> http://www.itk.org/Wiki/ITK_FAQ >>>>> >>>>> Follow this link to subscribe/unsubscribe: >>>>> http://public.kitware.com/mailman/listinfo/insight-developers >>> > _______________________________________________ Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Kitware offers ITK Training Courses, for more information visit: http://kitware.com/products/protraining.php Please keep messages on-topic and check the ITK FAQ at: http://www.itk.org/Wiki/ITK_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/mailman/listinfo/insight-developers