Hello Liu, > On Apr 6, 2022, at 6:14 AM, Zhuojin Liu <[email protected]> wrote: > > Hello, everyone. > > I'm an undergraduate student who's interested in taking part in the GSoC 2022 > program, and I'd like to discuss an idea that was not listed on the Summer of > Code Ideas list. > > GPU is used in a lot of machine learning packs to accelerate training and > inference, however, mlpack does not support GPU acceleration so far. I found > the plan of adding CUDA support to mlpack on > <https://www.mlpack.org/papers/vision.pdf > <https://www.mlpack.org/papers/vision.pdf>>, and I'm interested in it. > Therefore, I want to spend the summer adding GPU support to mlpack. This plan > includes: > > 1. Add a mlpack::mat class. Currently, we are using arma::mat in mlpack. > However, if we want to compute on GPUs, we have to use coot::mat. The > conversion between arma::mat and coot::mat is easy, but doing this conversion > manually every time when users want to convert from one to another is > bothering, thus a wrapper class that manages the device related information > is useful. The lower level implementation can use bandicoot and armadillo, we > only need to provide APIs that have the same semantics and names as arma::mat > and provide device related functions ( such as torch.Tensor.to > <https://torch.tensor.to/>(device) ). This looks like a big change, and I > don't think I am capable of designing a perfect class. So I want to be > discreet, only to implement this class after a proper discussion.
I think the better approach is to make sure every mlpack method uses a template type that specifies the matrix type to use. Some of the mlpack implementations already do that: https://github.com/mlpack/mlpack/blob/de94eee6881e33f1145d9d4a8b5380a5be8af36a/src/mlpack/methods/det/dtree.hpp#L44-L46 which allows me to construct my DTree with: DTree<arma::mat>, DTree<arma::fmat> or DTree<coot::mat> without having another wrapper class around the armadillo and bandicoot matrix types. However other implementations don't support the same interface: https://github.com/mlpack/mlpack/blob/de94eee6881e33f1145d9d4a8b5380a5be8af36a/src/mlpack/methods/pca/pca.hpp#L55-L58 is one example which needs to be adapted. > > 2. Implement ANN layers with bandicoot. No matter the mlpack::mat class will > be implemented or not, we can still implement ANN layers with bandicoot and > benefit from GPU acceleration. E.g., currently the naive_convolution > iteratively multiplies the corresponding elements in the filter and the > input, then add the result to the output. This can be parallelized easily on > GPUs. I want to start with some most used layers, implement their GPU version. Do you think a better approach would be to use arma::conv2 instead, which would allow us to implement a fast coot::conv2 version and use the same code in mlpack for CPU and GPU, without falling back to a specific implementation? > > 3. Contribute to bandicoot. Bandicoot is still an unstable library, thus we > may encounter unexpected situations like necessary functions are not > implemented or bugs. For example, some CUDA kernels in bandicoot can work > correctly only if the shape of the input matrix is a factor of 2. Therefore, > I want to implement functions and kernels and fix bugs I found during the > implementation of layers to help bandicoot release-ready. > > Thank your time for reading this email, and I'm looking forward to hearing > valuable feedback from you. Have a good day! > > Regards > Liu, Zhuojin > _______________________________________________ > mlpack mailing list > [email protected] > http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack Thanks Marcus
_______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
