Re: [ITK-dev] efficiency of vnl_matrix

Matt McCormick Thu, 12 Mar 2015 16:31:40 -0700

Hi Jian,

Thanks for sharing these performance testing coding. As Chuck also
demonstrated, measuring performance is critical for real progress.


Thanks,
Matt

On Thu, Mar 12, 2015 at 11:39 AM, Jian Cheng <jian.cheng.1...@gmail.com> wrote:
> Hi,
>
> For (2), the attachment is a comparison I made.
> It adds additional tests on vnl to the code from the previous link
> http://nghiaho.com/?p=1726 .
> Some header files can be downloaded from
> https://github.com/DiffusionMRITool/dmritool/tree/master/Modules/HelperFunctions/include
>
> Then you can build and run
> // With eigen
> // g++ -DTEST_EIGEN test_matrix_pseudoinverse.cpp -o
> test_matrix_pseudoinverse_vnl -lopencv_core  -O3 -DNDEBUG
>
> // With ARMA OpenBLAS
> // g++ -DTEST_ARMA test_matrix_pseudoinverse.cpp -o
> test_matrix_pseudoinverse -lopencv_core -larmadillo -lgomp -fopenmp
> -lopenblas -O3 -DNDEBUG -DHAVE_INLINE
>
> // with vnl
> // g++ -DTEST_VNL test_matrix_pseudoinverse.cpp -o
> test_matrix_pseudoinverse -lopencv_core -lvnl -lvnl_algo
> -I/usr/include/vxl/core -I/usr/include/vxl/vcl -O3 -DNDEBUG
> -DUTL_USE_FASTLAPACK
>
> // with vnl + openblas
> // g++ -DTEST_VNL_BLAS test_matrix_pseudoinverse.cpp -o
> test_matrix_pseudoinverse -lopencv_core -lopenblas  -lvnl -lvnl_algo
> -I/usr/include/vxl/core -I/usr/include/vxl/vcl -DNDEBUG  -O3
> -DUTL_USE_FASTLAPACK
> // with vnl + mkl
> // g++ -DTEST_VNL_BLAS test_matrix_pseudoinverse.cpp -o
> test_matrix_pseudoinverse -lopencv_core -lmkl_intel_lp64
> -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm  -lvnl -lvnl_algo
> -I/usr/include/vxl/core -I/usr/include/vxl/vcl -O3 -DNDEBUG
> -DUTL_USE_FASTLAPACK
>
> // with utl+openblas
> // g++ -DTEST_UTL test_matrix_pseudoinverse.cpp -o
> test_matrix_pseudoinverse -lopencv_core -lopenblas  -lvnl -lvnl_algo
> -I/usr/include/vxl/core -I/usr/include/vxl/vcl -DNDEBUG  -O3
> -DUTL_USE_FASTLAPACK
> // with utl + mkl;
> // g++ -DTEST_UTL test_matrix_pseudoinverse.cpp -o
> test_matrix_pseudoinverse -lopencv_core -lmkl_intel_lp64
> -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm  -lvnl -lvnl_algo
> -I/usr/include/vxl/core -I/usr/include/vxl/vcl -DNDEBUG -O3
> -DUTL_USE_FASTLAPACK
>
> In my experiments, using blas functions from mkl is the most efficient way.
>
> best,
> Jian Cheng
>
> On 03/12/2015 10:41 AM, Matt McCormick wrote:
>> Hi,
>>
>> From the discussion so far, it appears the following serious of steps
>> could be taken to move forward on performance:
>>
>> 1) Replace vnl_vector and vnl_matrix with vnl_vector_fixed and
>> vnl_matrix_fixed when possible.
>>
>> 2) Add Jian Cheng's BLAS and LAPACK backends for vnl_vector and vnl_matrix.
>>
>> 3) Add support for armadillo or eigen.
>>
>> 1) and 2) will be relatively easy to make, and will hopefully have an
>> immediate impact on performance.  3) will take make work to happen,
>> and it will take longer to impact the toolkit.  We will need to have
>> cross-platform builds of the libraries in the repository.  Also, many
>> ITK classes encapsulate their use of VNL very poorly, so it will not
>> be as simple as swapping out or improving their backends.
>>
>> 2 cents,
>> Matt
>>
>> On Thu, Mar 12, 2015 at 10:20 AM, Bradley Lowekamp
>> <blowek...@mail.nih.gov> wrote:
>>> Chuck,
>>>
>>> Thank you for giving us that important conclusion, under a quite difficult
>>> situations.
>>>
>>> I wonder if there is any distinction in the usage of vnl_matrix vs
>>> vnl_matrix_fixed. I would expect that operations done for pixel transforms
>>> should have there dimension know at run-time and should be able to use the
>>> vnl_matrix_fixed.
>>>
>>> I also have considered methods to transform a whole array of points at a
>>> time. I wonder if for 3x3*3*256 sized operations ( scan-line size ) if there
>>> would be benefit with the library based operations.
>>>
>>> Brad
>>>
>>>
>>> On Mar 12, 2015, at 10:02 AM, Chuck Atkins <chuck.atk...@kitware.com> wrote:
>>>
>>> I worked with Julie Langou, maintainer of LAPACK, on this project a few
>>> years ago.  The funding situation ended up very strange and messy and we
>>> basically had to cram 3 months worth of effort into 3 weeks, so needless to
>>> say, we were not able to really achieve our goals.  However, we spent a fair
>>> amount of time profiling ITK and analyzing it's hot spots from vnl to
>>> determine where to best spend the small ammount of time we had.  The results
>>> were not as straight forward as we expected.  It turns out that most of the
>>> use for vnl_matrix and vnl_vector was actually for an enourmous number of
>>> operations on very small sized vectors and matricies (dimensions of 2, 3, or
>>> 4), often for coordinate and geometry calculations or for small per-pixel
>>> operations that were not easily vectorized in the implementation at the
>>> time.  In these cases, the overhead of calling out to a BLAS or LAPACK
>>> library was much too expensive and the existing use of VNL was far more
>>> optimal.  This falls apart, however when trying to use vnl for more complex
>>> algorithms since the larger matrix operations will be where the benefit can
>>> be seen.  So just re-implementing the vnl vector and matrix classes and
>>> operators with underlying BLAS and LAPACK routines turned out to not be the
>>> best solution for ITK as a whole.
>>>
>>> - Chuck
>>> tage of the performance gains of large block matrix and vector
>>> operations seen with optimized BLAS and LAPACK libraries, the
>>> computations needed to be re-worked to act in an SoA (struct of
>>> arrays) fashion instead.  Given our limited time and resources, this
>>> was out of scope for what we could tackle.
>>>
>>> * Typically AoS and SoA refer to storage layout but I'm using it to
>>> refer to computation layout.  The terminology may not be correct but
>>> I think you can understand what I mean.
>>> On Thu, Mar 12, 2015 at 8:32 AM, Bradley Lowekamp <blowek...@mail.nih.gov>
>>> wrote:
>>>> Hello,
>>>>
>>>> If I was writing my own ITK classes, and needed a fast matrix library I
>>>> would likely pursue an additional dependency on an efficient numeric 
>>>> library
>>>> for that project, such as eigen.
>>>>
>>>> However for the broad appeal of ITK I would think a flexible back end
>>>> would be best. As I think there are a variety of BLAS and LAPACK libraries
>>>> available ( commercial, open source, vender free ). It would be nice to 
>>>> pick
>>>> one what has been optimized for the current architecture. I would think it
>>>> would be most flexible to use this interface in the back end of a chosen
>>>> numeric interface ( currently VNL ). Unfortunately, I don't have as much
>>>> experience with these libraries as I'd like.
>>>>
>>>> Brad
>>>>
>>>> On Mar 12, 2015, at 5:15 AM, m.star...@lumc.nl wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I think the eigen library is a mature and very fast library for these
>>>>> kind of things:
>>>>> http://eigen.tuxfamily.org/index.php?title=Main_Page
>>>>>
>>>>> You may want to check it out, to see if it offers what you need.
>>>>>
>>>>> It would be great to be able to use these within the itk.
>>>>>
>>>>> 2c
>>>>> Marius
>>>>>
>>>>> -----Original Message-----
>>>>> From: Insight-developers [mailto:insight-developers-boun...@itk.org] On
>>>>> Behalf Of Jian Cheng
>>>>> Sent: Wednesday, March 11, 2015 23:17
>>>>> To: Matt McCormick
>>>>> Cc: Chuck Atkins; ITK
>>>>> Subject: Re: [ITK-dev] efficiency of vnl_matrix
>>>>>
>>>>> Hi Matt,
>>>>>
>>>>> Thanks for your help, and also for the ITK workshop in UNC last time.
>>>>>
>>>>> It is very unfortunate. The efficiency of these numerical math operators
>>>>> are very important for many applications.
>>>>>
>>>>> I recently released an ITK based toolbox, called dmritool, for diffusion
>>>>> MRI data processing.
>>>>> It has some files to add some supports of blas, lapack, mkl to
>>>>> vnl_matrix and vnl_vector.
>>>>>
>>>>> http://diffusionmritool.github.io/dmritool_doxygen/utlBlas_8h_source.html
>>>>>
>>>>> http://diffusionmritool.github.io/dmritool_doxygen/utlVNLBlas_8h_source.html
>>>>>
>>>>> Those functions are not internally for vnl_matrix class. They are
>>>>> operators for the data pointer stored in vnl_matrix object.
>>>>> Thus, later I made a N-dimensional array library which internally
>>>>> includes those functions, and also supports expression template to avoid
>>>>> temporary copies.
>>>>>
>>>>> http://diffusionmritool.github.io/dmritool_doxygen/utlMatrix_8h_source.html
>>>>>
>>>>> http://diffusionmritool.github.io/dmritool_doxygen/utlVector_8h_source.html
>>>>>
>>>>> The efficiency comparison between vnl_vector/vnl_matrix and the
>>>>> vector/matrix using openblas, lapack, or mkl can be found by running those
>>>>> two tests
>>>>> https://github.com/DiffusionMRITool/dmritool/blob/master/Modules/HelperFunctions/test/utlVNLBlasGTest.cxx
>>>>>
>>>>> https://github.com/DiffusionMRITool/dmritool/blob/master/Modules/HelperFunctions/test/utlVNLLapackGTest.cxx
>>>>>
>>>>> Maybe some codes can be used as patches in somewhere in ITK. I am not
>>>>> sure. Maybe we need more discussion on it.
>>>>> With your help and discussion, I will be very glad to make my first
>>>>> patch to ITK.
>>>>> Thanks.
>>>>>
>>>>> best,
>>>>> Jian Cheng
>>>>>
>>>>>
>>>>> On 03/11/2015 04:39 PM, Matt McCormick wrote:
>>>>>> Hi Jian,
>>>>>>
>>>>>> Yes, it would be wonderful to improve the efficiency of these basic
>>>>>> numerical operations.
>>>>>>
>>>>>> Funding for the Refactor Numerical Libraries has currently ended, and
>>>>>> the effort is currently frozen.  However, you are more than welcome to
>>>>>> pick it up and we can help you get it into ITK.  More information on
>>>>>> the patch submission process can be found here [1] and in the ITK
>>>>>> Software Guide.
>>>>>>
>>>>>> Thanks,
>>>>>> Matt
>>>>>>
>>>>>> [1]
>>>>>> https://insightsoftwareconsortium.github.io/ITKBarCamp-doc/CommunitySo
>>>>>> ftwareProcess/SubmitAPatchToGerrit/index.html
>>>>>>
>>>>>> On Wed, Mar 11, 2015 at 4:07 PM, Jian Cheng <jian.cheng.1...@gmail.com>
>>>>>> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> My task using ITK has intensive matrix-matrix product, pseudo-inverse,
>>>>>>> etc.
>>>>>>> Thus the performance is actually mainly determined by the matrix
>>>>>>> library I used.
>>>>>>> Firstly I use vnl_matrix and vnl_vector in ITK. Then I found it is
>>>>>>> very inefficient because vnl matrix lib does not use blas and lapack.
>>>>>>> After I wrote my own matrix class which uses openblas and lapack, I
>>>>>>> got a hug gain of performance.
>>>>>>>
>>>>>>> I found there is a proposal to improve the efficiency of numerical
>>>>>>> libraries in ITK.
>>>>>>> http://www.itk.org/Wiki/ITK/Release_4/Refactor_Numerical_Libraries
>>>>>>> I am not sure how is the progress of the proposal.
>>>>>>> I wonder when the vnl matrix lib can internally support blas and
>>>>>>> lapack, or mkl, so that we can just use it without lose of the
>>>>>>> efficiency.
>>>>>>> Thanks.
>>>>>>>
>>>>>>> best,
>>>>>>> Jian Cheng
>>>>>>> _______________________________________________
>>>>>>> Powered by www.kitware.com
>>>>>>>
>>>>>>> Visit other Kitware open-source projects at
>>>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>>>
>>>>>>> Kitware offers ITK Training Courses, for more information visit:
>>>>>>> http://kitware.com/products/protraining.php
>>>>>>>
>>>>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>>>>
>>>>>>> Follow this link to subscribe/unsubscribe:
>>>>>>> http://public.kitware.com/mailman/listinfo/insight-developers
>>>>> _______________________________________________
>>>>> Powered by www.kitware.com
>>>>>
>>>>> Visit other Kitware open-source projects at
>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>
>>>>> Kitware offers ITK Training Courses, for more information visit:
>>>>> http://kitware.com/products/protraining.php
>>>>>
>>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>>
>>>>> Follow this link to subscribe/unsubscribe:
>>>>> http://public.kitware.com/mailman/listinfo/insight-developers
>>>>> _______________________________________________
>>>>> Powered by www.kitware.com
>>>>>
>>>>> Visit other Kitware open-source projects at
>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>
>>>>> Kitware offers ITK Training Courses, for more information visit:
>>>>> http://kitware.com/products/protraining.php
>>>>>
>>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>>
>>>>> Follow this link to subscribe/unsubscribe:
>>>>> http://public.kitware.com/mailman/listinfo/insight-developers
>>>
>
_______________________________________________
Powered by www.kitware.com

Visit other Kitware open-source projects at
http://www.kitware.com/opensource/opensource.html

Kitware offers ITK Training Courses, for more information visit:
http://kitware.com/products/protraining.php

Please keep messages on-topic and check the ITK FAQ at:
http://www.itk.org/Wiki/ITK_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/insight-developers

Re: [ITK-dev] efficiency of vnl_matrix

Reply via email to