Hi, Edel, and thanks for your valuable response! I think template specialization is a better solution than a new wrapper class, it requires less modification to current codes and is more consistent with the current style, so I'll go with this one. Thanks for your advice!
As for specific implementation, the situation varies from file to file. Just like you said, some classes/functions were implemented in a template way, so we can easily substitute arma::mat with coot::mat, but some of the template classes still use arma::mat instead of MatType::mat somewhere in their implementation. And others directly use arma::mat. So I think this is also something I can improve. As for the conv2 problem, the situation I described in my last email is the current implementation, I used NaiveConvolution just for an example to demonstrate what we can benefit from using GPUs. But I'd also like to optimize the CPU version and/or implement a more generic version of those functions. The worry I have is whether mlpack is ready for adding GPU supports now. Though it's on the mlpack vision list, but it's not on the GSoC idea list, plus that the bandicoot library is not release-ready now, so I'm not sure it's time to add GPU support to mlpack this year. Thanks Liu, Zhuojin On Apr 11, 2022, 04:14 +0800, Marcus Edel <[email protected]>, wrote: > Hello Liu, > > > On Apr 6, 2022, at 6:14 AM, Zhuojin Liu <[email protected]> wrote: > > > > Hello, everyone. > > > > I'm an undergraduate student who's interested in taking part in the GSoC > > 2022 program, and I'd like to discuss an idea that was not listed on the > > Summer of Code Ideas list. > > > > GPU is used in a lot of machine learning packs to accelerate training and > > inference, however, mlpack does not support GPU acceleration so far. I > > found the plan of adding CUDA support to mlpack on > > <https://www.mlpack.org/papers/vision.pdf>, and I'm interested in it. > > Therefore, I want to spend the summer adding GPU support to mlpack. This > > plan includes: > > > > 1. Add a mlpack::mat class. Currently, we are using arma::mat in mlpack. > > However, if we want to compute on GPUs, we have to use coot::mat. The > > conversion between arma::mat and coot::mat is easy, but doing this > > conversion manually every time when users want to convert from one to > > another is bothering, thus a wrapper class that manages the device related > > information is useful. The lower level implementation can use bandicoot and > > armadillo, we only need to provide APIs that have the same semantics and > > names as arma::mat and provide device related functions ( such as > > torch.Tensor.to(device) ). This looks like a big change, and I don't think > > I am capable of designing a perfect class. So I want to be discreet, only > > to implement this class after a proper discussion. > > I think the better approach is to make sure every mlpack method uses a > template > type that specifies the matrix type to use. Some of the mlpack implementations > already do that: > > https://github.com/mlpack/mlpack/blob/de94eee6881e33f1145d9d4a8b5380a5be8af36a/src/mlpack/methods/det/dtree.hpp#L44-L46 > > which allows me to construct my DTree with: > > DTree<arma::mat>, DTree<arma::fmat> or DTree<coot::mat> without having another > wrapper class around the armadillo and bandicoot matrix types. However other > implementations don't support the same interface: > > https://github.com/mlpack/mlpack/blob/de94eee6881e33f1145d9d4a8b5380a5be8af36a/src/mlpack/methods/pca/pca.hpp#L55-L58 > > is one example which needs to be adapted. > > > > > 2. Implement ANN layers with bandicoot. No matter the mlpack::mat class > > will be implemented or not, we can still implement ANN layers with > > bandicoot and benefit from GPU acceleration. E.g., currently the > > naive_convolution iteratively multiplies the corresponding elements in the > > filter and the input, then add the result to the output. This can be > > parallelized easily on GPUs. I want to start with some most used layers, > > implement their GPU version. > > Do you think a better approach would be to use arma::conv2 instead, which > would > allow us to implement a fast coot::conv2 version and use the same code in > mlpack > for CPU and GPU, without falling back to a specific implementation? > > > > > 3. Contribute to bandicoot. Bandicoot is still an unstable library, thus we > > may encounter unexpected situations like necessary functions are not > > implemented or bugs. For example, some CUDA kernels in bandicoot can work > > correctly only if the shape of the input matrix is a factor of 2. > > Therefore, I want to implement functions and kernels and fix bugs I found > > during the implementation of layers to help bandicoot release-ready. > > > > Thank your time for reading this email, and I'm looking forward to hearing > > valuable feedback from you. Have a good day! > > > > Regards > > Liu, Zhuojin > > _______________________________________________ > > mlpack mailing list > > [email protected] > > http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack > > Thanks > Marcus > >
_______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
