This particular JIRA is only partially related. Niketan and Nakul worked out the details - the only reason I show up as the reporter is that, if I remember correctly, we split a larger scoped JIRA for low-level optimizations (GPU, codegen, compression) into individual JIRAs and created the detailed tasks.
Overall, I believe that sparse GPU operations would be very valuable, especially in the context of NLP, graphs, and structured data with categorical features (which often become very sparse after dummy coding) because in these ultra-sparse scenarios dense operations cause unnecessary overheads of orders of magnitude (proportional to the sparsity). However, creating efficient sparse GPU kernels is challenging due to irregularities (e.g., sparsity skew). Compared to CPU operations, there might still be benefit depending on the data location of inputs/outputs, as well as higher memory bandwidth. Even in the face of extending the codegen framework for GPUs (which is still on the roadmap for this year), we would still need dense/sparse kernels for the individual operations because we want to apply codegen only if we can benefit from fusion. Right now we call existing libraries such as cuBLAS and cuDNN and have dense kernels for a subset of operations such as unary and binary, and unary aggregates. Regarding ramping up on the GPU backend, maybe it's a good idea to first start with missing dense operations. I'm thinking of statistical functions (e.g., covariance, moment), parameterized builtin functions (e.g., grouped aggregated), missing unary and binary operations (e.g., bitwise), missing reorg operations (e.g., reshape, sort - there should be library for the latter), missing unary, binary and ternary aggregates, missing nary (e.g., nary cbind/rbind), etc. Adding these remaining operations would also help a lot. However, if you're more interested in contributing to the development of sparse kernels, maybe you could one or two dense operations, get comfortable, and then move on to sparse operations. Apart from the kernels, a seamless support for sparse operations would also require some integration work on how we pass data, maintain nnz, preallocate sparse outputs, etc. Regards, Matthias On Thu, May 10, 2018 at 8:47 PM, Janardhan <[email protected]> wrote: > Hi Matthias, > > Was this related to long term plan for GPU codegen? > > Thank you, > Janardhan
