pgplus1628 opened a new issue #7407: concat operator implementation
   It seems the current implementation of `concat` operator is based on 
mshadow. And if the input of concat has multiple NDArray, on gpu, it will 
launch kernel for many times. Tensorflow has customized kernel for concat 
operator, it will do kernel launch only once. 
   Any plan to optimize this?
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

Reply via email to