Hello Stéphane, AFAIK, as in the old interface the new C++ one allow you to get pointers to the raw double* values and so allow you to write anything to the inputs. However this kind of behavior is considered a bug as from the user point of view (eg. Scilab beginners) each assignation is supposed to have inputs (unmodified data) and outputs (computed data).
Note: about the += operator, this is indeed a way to tell the user that the argument is modified in place with a specific + operation. Thanks, -- Clément Le jeudi 22 février 2018 à 14:18 +0100, Stéphane Mottelet a écrit : > Really, nobody knows ? > > S. > > Le 20/02/2018 à 11:57, Stéphane Mottelet a écrit : > > Hello, > > > > Continuing on this subject, Hello, I discovered that the new Scilab API > > allows to modify input > > parameters of a function (in-place assignment), e.g. I have modified the > > previous daxpy such > > that the expression > > > > daxpy(2,X,Y) > > > > has no output but formally does "Y+=2*X" if such an operator would exist in > > Scilab. In this case > > there is no matrix copy at all, hence no memory overhead. > > > > Was it possible to do this with the previous API ? > > > > S. > > > > Le 19/02/2018 à 19:15, Stéphane Mottelet a écrit : > > > Hello, > > > > > > After some tests, for the intended use (multiply a matrix by a scalar), > > > dgemm is not faster > > > that dscal, but in the C code of "iMultiRealScalarByRealMatrix", the part > > > which takes the most > > > of the CPU time is the call to "dcopy". For example, on my machine, for > > > a 10000x10000 matrix, > > > the call to dcopy takes 540 milliseconds and the call to dscal 193 > > > milliseconds. Continuing my > > > explorations today, I tried to see how Scilab expressions such as > > > > > > Y+2*X > > > > > > are parsed and executed. To this purpose I have written an interface > > > (daxpy.sci and daxpy.c > > > attached) to the BLAS function "daxpy" which does "y<-y+a*x" and a script > > > comparing the above > > > expression to > > > > > > daxpy(2,X,Y) > > > > > > for two 10000x10000 matrices. Here are the results (MacBook air core > > > i7@1,7GHz): > > > > > > daxpy(2,X,Y) > > > (dcopy: 582 ms) > > > (daxpy: 211 ms) > > > > > > elapsed time: 793 ms > > > > > > Y+2*X > > > > > > elapsed time: 1574 ms > > > > > > Considered the above value, the explanation is that in "Y+2*X" there are > > > *two* copies of a > > > 10000x10000 matrix instead of only one in "daxpy(2,X,Y)". In "Y+2*X+3*Z" > > > there will be three > > > copies, although there could be only one if daxpy was used twice. > > > > > > I am not blaming Scilab here, I am just blaming "vectorization", which > > > can be inefficient when > > > large objects are used. That's why explicits loops can > > > sometimes be faster than > > > vectorized operations in Matlab or Julia (which both use JIT compilation). > > > > > > S. > > > > > > > > > Le 15/02/2018 à 17:11, Antoine ELIAS a écrit : > > > > Hello Stéphane, > > > > > > > > Interesting ... > > > > > > > > In release, we don't ship the header of BLAS/LAPACK functions. > > > > But you can define them in your C file as extern. ( and let the linker > > > > do his job ) > > > > > > > > extern int C2F(dgemm) (char *_pstTransA, char *_pstTransB, int *_piN, > > > > int *_piM, int *_piK, > > > > double *_pdblAlpha, double *_pdblA, int *_piLdA, > > > > double *_pdblB, int *_piLdB, double *_pdblBeta, > > > > double *_pdblC, int > > > > *_piLdC); > > > > and > > > > > > > > extern int C2F(dscal) (int *_iSize, double *_pdblVal, double > > > > *_pdblDest, int *_iInc); > > > > > > > > Others BLAS/LAPACK prototypes can be found at > > > > http://cgit.scilab.org/scilab/tree/scilab/modu > > > > les/elementary_functions/includes/elem_common.h?h=6.0 > > > > > > > > Regards, > > > > Antoine > > > > Le 15/02/2018 à 16:50, Stéphane Mottelet a écrit : > > > > > Hello all, > > > > > > > > > > Following the recent discussion with fujimoto, I discovered that > > > > > Scilab does not (seem to) > > > > > fully use SIMD operation in BLAS as it should. Besides the > > > > > bottlenecks of its code, there > > > > > are also many operations of the kind > > > > > > > > > > scalar*matrix > > > > > > > > > > Althoug this operation is correctly delegated to the DSCAL BLAS > > > > > function (can be seen in C > > > > > function iMultiRealMatrixByRealMatrix in > > > > > modules/ast/src/c/operations/matrix_multiplication.c) : > > > > > > > > > > > int iMultiRealScalarByRealMatrix( > > > > > > double _dblReal1, > > > > > > double *_pdblReal2, int _iRows2, int _iCols2, > > > > > > double *_pdblRealOut) > > > > > > { > > > > > > int iOne = 1; > > > > > > int iSize2 = _iRows2 * _iCols2; > > > > > > > > > > > > C2F(dcopy)(&iSize2, _pdblReal2, &iOne, _pdblRealOut, &iOne); > > > > > > C2F(dscal)(&iSize2, &_dblReal1, _pdblRealOut, &iOne); > > > > > > return 0; > > > > > > } > > > > > in the code below the product "A*1" is likely using only one > > > > > processor core, as seen on > > > > > the cpu usage graph and on the elapsed time, > > > > > > > > > > A=rand(20000,20000); > > > > > tic; for i=1:10; A*1; end; toc > > > > > > > > > > ans = > > > > > > > > > > 25.596843 > > > > > > > > > > but this second piece of code is more than 8 times faster and uses > > > > > 100% of the cpu, > > > > > > > > > > ONE=ones(20000,1); > > > > > tic; for i=1:10; A*ONE; end; toc > > > > > > > > > > ans = > > > > > > > > > > 2.938314 > > > > > > > > > > with roughly the same number of multiplications. This second > > > > > computation is delegated to > > > > > DGEMM (C<-alpha*A*B + beta*C, here with alpha=1 and beta=0) > > > > > > > > > > > int iMultiRealMatrixByRealMatrix( > > > > > > double *_pdblReal1, int _iRows1, int _iCols1, > > > > > > double *_pdblReal2, int _iRows2, int _iCols2, > > > > > > double *_pdblRealOut) > > > > > > { > > > > > > double dblOne = 1; > > > > > > double dblZero = 0; > > > > > > > > > > > > C2F(dgemm)("n", "n", &_iRows1, &_iCols2, &_iCols1, &dblOne, > > > > > > _pdblReal1 , &_iRows1 , > > > > > > _pdblReal2, &_iRows2, &dblZero, > > > > > > _pdblRealOut , &_iRows1); > > > > > > return 0; > > > > > > } > > > > > Maybe my intuition is wrong, but I have the feeling that using dgemm > > > > > with alpha=0 will be > > > > > faster than dscal. I plan to test this by making a quick and dirty > > > > > code linked to Scilab > > > > > so my question to devs is : which are the #includes to add on top of > > > > > the source (C) to be > > > > > able to call dgemm and dscal ? > > > > > > > > > > Thanks for your help > > > > > > > > > > S. > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > dev mailing list > > > > dev@lists.scilab.org > > > > http://lists.scilab.org/mailman/listinfo/dev > > > > > > > > > > > > _______________________________________________ > > > dev mailing list > > > dev@lists.scilab.org > > > http://lists.scilab.org/mailman/listinfo/dev > > > > -- > > Stéphane Mottelet > > Ingénieur de recherche > > EA 4297 Transformations Intégrées de la Matière Renouvelable > > Département Génie des Procédés Industriels > > Sorbonne Universités - Université de Technologie de Compiègne > > CS 60319, 60203 Compiègne cedex > > Tel : +33(0)344234688 > > http://www.utc.fr/~mottelet > > > > > > _______________________________________________ > > dev mailing list > > dev@lists.scilab.org > > http://lists.scilab.org/mailman/listinfo/dev > > -- > Stéphane Mottelet > Ingénieur de recherche > EA 4297 Transformations Intégrées de la Matière Renouvelable > Département Génie des Procédés Industriels > Sorbonne Universités - Université de Technologie de Compiègne > CS 60319, 60203 Compiègne cedex > Tel : +33(0)344234688 > http://www.utc.fr/~mottelet > _______________________________________________ > dev mailing list > dev@lists.scilab.org > http://lists.scilab.org/mailman/listinfo/dev _______________________________________________ dev mailing list dev@lists.scilab.org http://lists.scilab.org/mailman/listinfo/dev