Hello All, As some of you may know, a few of us at Quansight have started (in parntership with NVIDIA) have started looking at Arrow's GPU capabilites. We are excited to help improve and expand Arrow's GPU support, but we did have a few initial scoping questions.
Feel free to break these out into separate discussion threads if needed. Hopefully, some of them will be easy enough to answer. 1. What is the status of the GPU code in arrow now? E.g. https://github.com/apache/arrow/tree/master/cpp/src/arrow/gpu Is anyone actively working on this part of the code base? Are there other folks working on GPU support? I'd love to chat, if so! 2. Should arrow compute assume that everything fits in memory? Arrow seem to handle data that is larger than memory via the Buffer API. Are there restrictions that using Buffers imply that we should be aware of? 3. What is the imagined interface be the pyarrow and a GPU DataFrame? One idea is to have the selection of main memory and the GPU to be totally transparent to the user. Another possible suggestion is to be explicit to the user about where the data lives, for example: >>> import pyarrow as pa >>> a = pa.array(..., type=...) # create pyarrow array instance >>> a_g = a.to_gpu(<device parameters>) # send `a` to GPU >>> def foo(a): ... return ... # a function doing operations with `a` >>> r = foo(a) # perform operations with `a`, runs on CPU >>> r_g = foo(a_g) # perform operations with `a_g`, runs on GPU >>> assert r == r_g.to_mem() # results are the same 4. Who has been working on arrow compute kernels, are there any design docs or discussions we should look at? We've been following the Gandiva discussions and also the Ursa Labs Roadmap <https://ursalabs.org/tech/#arrow-native-computation-engine>. 5. Should the user be able be able to switch between compute implementations at runtime, or only at compile time? 6. Arrow's CI doesn't currently seem to support GPUs. If a free GPU CI service were to come along, would Arrow be open to using it? Other than that we'd love to know where and how we can plug in and help out! Be Well Anthony -- Asst. Prof. Anthony Scopatz Nuclear Engineering Program Mechanical Engineering Dept. University of South Carolina scop...@cec.sc.edu Cell: (512) 827-8239 Book a meeting with me at https://scopatz.youcanbook.me/ Open up an issue: https://github.com/scopatz/me/issues Check my calendar <https://www.google.com/calendar/embed?src=scopatz%40gmail.com>