chrishkchris commented on a change in pull request #535: SINGA-490 Optimize performance of stochastic gradient descent (SGD) URL: https://github.com/apache/incubator-singa/pull/535#discussion_r326004398
########## File path: src/core/tensor/tensor_math_cuda.h ########## @@ -324,12 +324,8 @@ void EltwiseMult<float, lang::Cuda>(const Tensor& in, const float x, Tensor* out, Context* ctx) { const float* inPtr = static_cast<const float*>(in.block()->data()); float* outPtr = static_cast<float*>(out->block()->mutable_data()); - - float alpha = x, beta = 0.0; - check_cudnn(cudnnAddTensor(ctx->cudnn_handle, - (void*)(&alpha), generate_tensor_nd_desc(in), inPtr, - (void*)(&beta), generate_tensor_nd_desc(*out), outPtr - )); + const size_t num = in.Size(); Review comment: Yes, must be. It is because cudnnAddTensor add two tensors alphaX+betaY and write the result to Y. Meanwhile, cuda::mult is for purely elementwise multiply. The computational cost of cudnnAddTensor is hence much higher than cuda::mult. We do not need to use a tensor add function to perform elementwise multiply purpose for the consideration of computation cost. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services