Re: [Scilab-Dev] BLAS use in Scilab

stephane . mottelet Mon, 26 Feb 2018 04:49:38 -0800

 Hello,

Thank you for your answer. Would it have an interest to implement suchan operator (and, as a consequence, open a kind of Pandora's box..) ?


S.

Quoting Clément David <[email protected]>:

Hello Stéphane,
AFAIK, as in the old interface the new C++ one allow you to getpointers to the raw double* valuesand so allow you to write anything to the inputs. However this kindof behavior is considered a bugas from the user point of view (eg. Scilab beginners) eachassignation is supposed to have inputs
(unmodified data) and outputs (computed data).
Note: about the += operator, this is indeed a way to tell the userthat the argument is modified in
place with a specific + operation.

Thanks,

--
Clément

Le jeudi 22 février 2018 à 14:18 +0100, Stéphane Mottelet a écrit :
Really, nobody knows ?

S.

Le 20/02/2018 à 11:57, Stéphane Mottelet a écrit :
Hello,
Continuing on this subject, Hello, I discovered that the new ScilabAPI allows to modify inputparameters of a function (in-place assignment), e.g. I havemodified the previous daxpy such
that the expression

daxpy(2,X,Y)
has no output but formally does "Y+=2*X" if such an operator wouldexist in Scilab. In this case
there is no matrix copy at all, hence no memory overhead.

Was it possible to do this with the previous API ?

S.

Le 19/02/2018 à 19:15, Stéphane Mottelet a écrit :
Hello,
After some tests, for the intended use (multiply a matrix by ascalar), dgemm is not fasterthat dscal, but in the C code of "iMultiRealScalarByRealMatrix",the part which takes the mostof the CPU time is the call to "dcopy". For example, on mymachine, for a 10000x10000 matrix,the call to dcopy takes 540 milliseconds and the call to dscal 193milliseconds. Continuing my
explorations today, I tried to see how Scilab expressions such as

Y+2*X
are parsed and executed. To this purpose I have written aninterface (daxpy.sci and daxpy.cattached) to the BLAS function "daxpy" which does "y<-y+a*x" and ascript comparing the above
expression to

daxpy(2,X,Y)
for two 10000x10000 matrices. Here are the results (MacBook aircore i7@1,7GHz):
  daxpy(2,X,Y)
  (dcopy: 582 ms)
  (daxpy: 211 ms)

  elapsed time: 793 ms

  Y+2*X

  elapsed time: 1574 ms
Considered the above value, the explanation is that in "Y+2*X"there are *two* copies of a10000x10000 matrix instead of only one in "daxpy(2,X,Y)". In"Y+2*X+3*Z" there will be three
copies, although there could be only one if daxpy was used twice.
I am not blaming Scilab here, I am just blaming "vectorization",which can be inefficient whenlarge objects are used. That's why explicits loops cansometimes be faster than
vectorized operations in Matlab or Julia (which both use JIT compilation).

S.


Le 15/02/2018 à 17:11, Antoine ELIAS a écrit :
> Hello Stéphane,
>
> Interesting ...
>
> In release, we don't ship the header of BLAS/LAPACK functions.
> But you can define them in your C file as extern. ( and let thelinker do his job )
>
> extern int C2F(dgemm) (char *_pstTransA, char *_pstTransB, int*_piN, int *_piM, int *_piK,
> double *_pdblAlpha, double *_pdblA, int *_piLdA,
> double *_pdblB, int *_piLdB, double*_pdblBeta, double *_pdblC, int
> *_piLdC);
> and
>
> extern int C2F(dscal) (int *_iSize, double *_pdblVal, double*_pdblDest, int *_iInc);
>
> Others BLAS/LAPACK prototypes can be found athttp://cgit.scilab.org/scilab/tree/scilab/modu
> les/elementary_functions/includes/elem_common.h?h=6.0
>
> Regards,
> Antoine
> Le 15/02/2018 à 16:50, Stéphane Mottelet a écrit :
> > Hello all,
> >
> > Following the recent discussion with fujimoto, I discoveredthat Scilab does not (seem to)> > fully use SIMD operation in BLAS as it should. Besides thebottlenecks of its code, there
> > are also many operations of the kind
> >
> > scalar*matrix
> >
> > Althoug this operation is correctly delegated to the DSCALBLAS function (can be seen in C
> > function iMultiRealMatrixByRealMatrix in
> > modules/ast/src/c/operations/matrix_multiplication.c) :
> >
> > > int iMultiRealScalarByRealMatrix(
> > >     double _dblReal1,
> > >     double *_pdblReal2,    int _iRows2, int _iCols2,
> > >     double *_pdblRealOut)
> > > {
> > >     int iOne    = 1;
> > >     int iSize2    = _iRows2 * _iCols2;
> > >
> > >     C2F(dcopy)(&iSize2, _pdblReal2, &iOne, _pdblRealOut, &iOne);
> > >     C2F(dscal)(&iSize2, &_dblReal1, _pdblRealOut, &iOne);
> > >     return 0;
> > > }
> > in the code below the product "A*1" is likely using only oneprocessor core, as seen on
> > the cpu usage graph and on the elapsed time,
> >
> > A=rand(20000,20000);
> > tic; for i=1:10; A*1; end; toc
> >
> >  ans  =
> >
> >    25.596843
> >
> > but this second piece of code is more than 8 times faster anduses 100% of the cpu,
> >
> > ONE=ones(20000,1);
> > tic; for i=1:10; A*ONE; end; toc
> >
> >  ans  =
> >
> >    2.938314
> >
> > with roughly the same number of multiplications. This secondcomputation is delegated to
> > DGEMM (C<-alpha*A*B + beta*C, here with alpha=1 and beta=0)
> >
> > > int iMultiRealMatrixByRealMatrix(
> > >     double *_pdblReal1,    int _iRows1, int _iCols1,
> > >     double *_pdblReal2,    int _iRows2, int _iCols2,
> > >     double *_pdblRealOut)
> > > {
> > >     double dblOne        = 1;
> > >     double dblZero        = 0;
> > >
> > >     C2F(dgemm)("n", "n", &_iRows1, &_iCols2, &_iCols1, &dblOne,
> > >                _pdblReal1 , &_iRows1 ,
> > >                _pdblReal2, &_iRows2, &dblZero,
> > >                _pdblRealOut , &_iRows1);
> > >     return 0;
> > > }
> > Maybe my intuition is wrong, but I have the feeling thatusing dgemm with alpha=0 will be> > faster than dscal. I plan to test this by making a quick anddirty code linked to Scilab> > so my question to devs is : which are the #includes to add ontop of the source (C) to be
> > able to call dgemm and dscal ?
> >
> > Thanks for your help
> >
> > S.
> >
> >
>
> _______________________________________________
> dev mailing list
> [email protected]
> http://lists.scilab.org/mailman/listinfo/dev



_______________________________________________
dev mailing list
[email protected]
http://lists.scilab.org/mailman/listinfo/dev
--
Stéphane Mottelet
Ingénieur de recherche
EA 4297 Transformations Intégrées de la Matière Renouvelable
Département Génie des Procédés Industriels
Sorbonne Universités - Université de Technologie de Compiègne
CS 60319, 60203 Compiègne cedex
Tel : +33(0)344234688
http://www.utc.fr/~mottelet

_______________________________________________
dev mailing list
[email protected]
http://lists.scilab.org/mailman/listinfo/dev

--
Stéphane Mottelet
Ingénieur de recherche
EA 4297 Transformations Intégrées de la Matière Renouvelable
Département Génie des Procédés Industriels
Sorbonne Universités - Université de Technologie de Compiègne
CS 60319, 60203 Compiègne cedex
Tel : +33(0)344234688
http://www.utc.fr/~mottelet
_______________________________________________
dev mailing list
[email protected]
http://lists.scilab.org/mailman/listinfo/dev
_______________________________________________
dev mailing list
[email protected]http://lists.scilab.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
[email protected]
http://lists.scilab.org/mailman/listinfo/dev

Re: [Scilab-Dev] BLAS use in Scilab

Reply via email to