Re: [Scilab-Dev] BLAS use in Scilab

Stéphane Mottelet Mon, 19 Feb 2018 10:16:29 -0800

Hello,

After some tests, for the intended use (multiply a matrix by a scalar),dgemm is not faster that dscal, but in the C code of"iMultiRealScalarByRealMatrix", the part which takes the most of the CPUtime is the call to "dcopy". For example, on my machine, for a10000x10000 matrix, the call to dcopy takes 540 milliseconds and thecall to dscal 193 milliseconds. Continuing my explorations today, Itried to see how Scilab expressions such as


Y+2*X

are parsed and executed. To this purpose I have written an interface(daxpy.sci and daxpy.c attached) to the BLAS function "daxpy" which does"y<-y+a*x" and a script comparing the above expression to


daxpy(2,X,Y)

for two 10000x10000 matrices. Here are the results (MacBook air corei7@1,7GHz):


 daxpy(2,X,Y)
 (dcopy: 582 ms)
 (daxpy: 211 ms)

 elapsed time: 793 ms

 Y+2*X

 elapsed time: 1574 ms

Considered the above value, the explanation is that in "Y+2*X" there are*two* copies of a 10000x10000 matrix instead of only one in"daxpy(2,X,Y)". In "Y+2*X+3*Z" there will be three copies, althoughthere could be only one if daxpy was used twice.

I am not blaming Scilab here, I am just blaming "vectorization", whichcan be inefficient when large objects are used. That's why explicitsloops can sometimes be faster than vectorized operations in Matlab orJulia (which both use JIT compilation).


S.


Le 15/02/2018 à 17:11, Antoine ELIAS a écrit :

Hello Stéphane,

Interesting ...

In release, we don't ship the header of BLAS/LAPACK functions.
But you can define them in your C file as extern. ( and let the linkerdo his job )
extern int C2F(dgemm) (char *_pstTransA, char *_pstTransB, int *_piN,int *_piM, int *_piK, double *_pdblAlpha, double *_pdblA, int *_piLdA, double *_pdblB, int *_piLdB, double *_pdblBeta,double *_pdblC, int *_piLdC);
and
extern int C2F(dscal) (int *_iSize, double *_pdblVal, double*_pdblDest, int *_iInc);
Others BLAS/LAPACK prototypes can be found athttp://cgit.scilab.org/scilab/tree/scilab/modules/elementary_functions/includes/elem_common.h?h=6.0
Regards,
Antoine
Le 15/02/2018 à 16:50, Stéphane Mottelet a écrit :
Hello all,
Following the recent discussion with fujimoto, I discovered thatScilab does not (seem to) fully use SIMD operation in BLAS as itshould. Besides the bottlenecks of its code, there are also manyoperations of the kind
scalar*matrix
Althoug this operation is correctly delegated to the DSCAL BLASfunction (can be seen in C function iMultiRealMatrixByRealMatrix inmodules/ast/src/c/operations/matrix_multiplication.c) :
int iMultiRealScalarByRealMatrix(
    double _dblReal1,
    double *_pdblReal2,    int _iRows2, int _iCols2,
    double *_pdblRealOut)
{
    int iOne    = 1;
    int iSize2    = _iRows2 * _iCols2;

    C2F(dcopy)(&iSize2, _pdblReal2, &iOne, _pdblRealOut, &iOne);
    C2F(dscal)(&iSize2, &_dblReal1, _pdblRealOut, &iOne);
    return 0;
}
in the code below the product "A*1" is likely using only oneprocessor core, as seen on the cpu usage graph and on the elapsed time,
A=rand(20000,20000);
tic; for i=1:10; A*1; end; toc

 ans  =

   25.596843
but this second piece of code is more than 8 times faster and uses100% of the cpu,
ONE=ones(20000,1);
tic; for i=1:10; A*ONE; end; toc

 ans  =

   2.938314
with roughly the same number of multiplications. This secondcomputation is delegated to DGEMM (C<-alpha*A*B + beta*C, here withalpha=1 and beta=0)
int iMultiRealMatrixByRealMatrix(
    double *_pdblReal1,    int _iRows1, int _iCols1,
    double *_pdblReal2,    int _iRows2, int _iCols2,
    double *_pdblRealOut)
{
    double dblOne        = 1;
    double dblZero        = 0;

    C2F(dgemm)("n", "n", &_iRows1, &_iCols2, &_iCols1, &dblOne,
               _pdblReal1 , &_iRows1 ,
               _pdblReal2, &_iRows2, &dblZero,
               _pdblRealOut , &_iRows1);
    return 0;
}
Maybe my intuition is wrong, but I have the feeling that using dgemmwith alpha=0 will be faster than dscal. I plan to test this by makinga quick and dirty code linked to Scilab so my question to devs is :which are the #includes to add on top of the source (C) to be able tocall dgemm and dscal ?
Thanks for your help

S.
_______________________________________________
dev mailing list
[email protected]
http://lists.scilab.org/mailman/listinfo/dev



--
Stéphane Mottelet
Ingénieur de recherche
EA 4297 Transformations Intégrées de la Matière Renouvelable
Département Génie des Procédés Industriels
Sorbonne Universités - Université de Technologie de Compiègne
CS 60319, 60203 Compiègne cedex
Tel : +33(0)344234688
http://www.utc.fr/~mottelet

#include "api_scilab.h"
#include "Scierror.h"
#include "localization.h"
#include "machine.h"
#include <sys/timeb.h>
                         
extern int F2C(dcopy)(int *, double *, int *, double *, int *);                 
      
extern int C2F(daxpy) (int *, double *, double *, int *, double *, int *);

uint64_t system_current_time_millis()
{
#if defined(_WIN32) || defined(_WIN64)
    struct _timeb timebuffer;
    _ftime(&timebuffer);
    return (uint64_t)(((timebuffer.time * 1000) + timebuffer.millitm));
#else
    struct timeb timebuffer;
    ftime(&timebuffer);
    return (uint64_t)(((timebuffer.time * 1000) + timebuffer.millitm));
#endif
}

static const char fname[] = "daxpy";
/* ==================================================================== */
int sci_daxpy(scilabEnv env, int nin, scilabVar* in, int nopt, scilabOpt* opt, 
int nout, scilabVar* out)
{
    int i = 0;
    int rowAlpha = 0;
    int colAlpha = 0;
    int rowX = 0;
    int colX = 0;
    int rowY = 0;
    int colY = 0;
    int iOne = 1;
    int iZero = 0;
    double dblZero = 0;
    int size; 
    double* alpha = NULL;
    double* Y = NULL;
    double* X = NULL;
    double* Z = NULL;
    
    uint64_t T1,T2;

    if (nin != 3) {
        Scierror(77, _("%s: Wrong number of input argument(s): %d 
expected.\n"), fname, 3);
        return 1;
    }

    if (scilab_isDouble(env, in[0]) == 0 || scilab_isMatrix2d(env, in[0]) == 0 
|| scilab_isComplex(env, in[0]) == 1) {
        Scierror(999, _("%s: wrong type for input argument #%d: a real scalar 
expected.\n"), fname, 1);
        return 1;
    }

    if (scilab_isDouble(env, in[1]) == 0 || scilab_isMatrix2d(env, in[1]) == 0) 
{
        Scierror(999, _("%s: wrong type for input argument #%d: a real matrix 
expected.\n"), fname, 2);
        return 1;
    }

    if (scilab_isDouble(env, in[2]) == 0 || scilab_isMatrix2d(env, in[2]) == 0) 
{
        Scierror(999, _("%s: wrong type for input argument #%d: a real matrix 
expected.\n"), fname, 3);
        return 1;
    }

    scilab_getDim2d(env, in[0], &rowAlpha, &colAlpha);
    scilab_getDim2d(env, in[1], &rowX, &colX);    
    scilab_getDim2d(env, in[2], &rowY, &colY);

    if (rowAlpha*colAlpha != 1) {
        Scierror(999, _("%s: Wrong size for input arguments: first argument 
must be a scalar.\n"), fname);
        return 1;
    }

    if ((rowX != rowY) ||  (colX != colY)) {
        Scierror(999, _("%s: Wrong size for input arguments: second and third 
argument must have the same dimensions.\n"), fname);
        return 1;
    }
    
    scilab_getDoubleArray(env, in[0], &alpha);
    scilab_getDoubleArray(env, in[1], &X);
    scilab_getDoubleArray(env, in[2], &Y);

    out[0] = scilab_createDoubleMatrix2d(env, rowX, colX, 0);

    scilab_getDoubleArray(env, out[0], &Z);

    size=(rowX*colX);
    T1=system_current_time_millis();
    F2C(dcopy)(&size, Y, &iOne, Z, &iOne);
    T2=system_current_time_millis();    
    sciprint(" (dcopy: %u ms)\n",T2-T1);
    T1=T2;
    F2C(daxpy)(&size, alpha, X, &iOne, Z, &iOne);
    T2=system_current_time_millis();    
    sciprint(" (daxpy: %u ms)\n",T2-T1);

    return 0;
}

cd(get_absolute_file_path(("daxpy.sci")))
ulink
files=["daxpy.c"];
ilib_build('mult_lib',['daxpy','sci_daxpy','csci6'],files,[]);  // "csi6" -> 
new Scilab 6 API
exec loader.sce

X=rand(10000,10000);
Y=rand(10000,10000);

disp("daxpy(2,X,Y)")
tic;
daxpy(2,X,Y);
disp(sprintf("elapsed time: %d ms",toc()*1000))
disp("Y+2*X")
tic;
Y+2*X;
disp(sprintf("elapsed time: %d ms",toc()*1000))

_______________________________________________
dev mailing list
[email protected]
http://lists.scilab.org/mailman/listinfo/dev

Re: [Scilab-Dev] BLAS use in Scilab

Reply via email to