This is an automated email from the ASF dual-hosted git repository. apitrou pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/master by this push: new bcfacaa ARROW-3233: [Python] Add prose documentation for CUDA support bcfacaa is described below commit bcfacaafcb181a39d43dbb3d0540c018a5afe157 Author: Antoine Pitrou <anto...@python.org> AuthorDate: Wed Jan 9 23:12:31 2019 +0100 ARROW-3233: [Python] Add prose documentation for CUDA support It will be harder to add generated API docs without requiring CUDA support on the machine building the docs. Author: Antoine Pitrou <anto...@python.org> Closes #3359 from pitrou/ARROW-3233-pyarrow-cuda-doc and squashes the following commits: 40b63f0f <Antoine Pitrou> ARROW-3233: Add prose documentation for CUDA support --- docs/source/python/cuda.rst | 159 ++++++++++++++++++++++++++++++++++++++++++ docs/source/python/index.rst | 1 + docs/source/python/memory.rst | 3 + 3 files changed, 163 insertions(+) diff --git a/docs/source/python/cuda.rst b/docs/source/python/cuda.rst new file mode 100644 index 0000000..b0150c1 --- /dev/null +++ b/docs/source/python/cuda.rst @@ -0,0 +1,159 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. currentmodule:: pyarrow.cuda + +CUDA Integration +================ + +Arrow is not limited to CPU buffers (located in the computer's main memory, +also named "host memory"). It also has provisions for accessing buffers +located on a CUDA-capable GPU device (in "device memory"). + +.. note:: + This functionality is optional and must have been enabled at build time. + If this is not done by your package manager, you might have to build Arrow + yourself. + +CUDA Contexts +------------- + +A CUDA context represents access to a particular CUDA-capable device. +For example, this is creating a CUDA context accessing CUDA device number 0:: + + >>> from pyarrow import cuda + >>> ctx = cuda.Context(0) + >>> + +CUDA Buffers +------------ + +A CUDA buffer can be created by copying data from host memory to the memory +of a CUDA device, using the :meth:`Context.buffer_from_data` method. +The source data can be any Python buffer-like object, including Arrow buffers:: + + >>> import numpy as np + >>> arr = np.arange(4, dtype=np.int32) + >>> arr.nbytes + 16 + >>> cuda_buf = ctx.buffer_from_data(arr) + >>> type(cuda_buf) + pyarrow._cuda.CudaBuffer + >>> cuda_buf.size # The buffer's size in bytes + 16 + >>> cuda_buf.address # The buffer's address in device memory + 30088364544 + >>> cuda_buf.context.device_number + 0 + +Conversely, you can copy back a CUDA buffer to device memory, getting a regular +CPU buffer:: + + >>> buf = cuda_buf.copy_to_host() + >>> type(buf) + pyarrow.lib.Buffer + >>> np.frombuffer(buf, dtype=np.int32) + array([0, 1, 2, 3], dtype=int32) + +.. warning:: + Many Arrow functions expect a CPU buffer but will not check the buffer's + actual type. You will get a crash if you pass a CUDA buffer to such a + function:: + + >>> pa.py_buffer(b"x" * 16).equals(cuda_buf) + Segmentation fault + +Numba Integration +----------------- + +There is not much you can do directly with Arrow CUDA buffers from Python, +but they support interoperation with `Numba <https://numba.pydata.org/>`_, +a JIT compiler which can turn Python code into optimized CUDA kernels. + +Arrow to Numba +~~~~~~~~~~~~~~ + +First let's define a Numba CUDA kernel operating on an ``int32`` array. Here, +we will simply increment each array element (assuming the array is writable):: + + import numba.cuda + + @numba.cuda.jit + def increment_by_one(an_array): + pos = numba.cuda.grid(1) + if pos < an_array.size: + an_array[pos] += 1 + +Then we need to wrap our CUDA buffer into a Numba "device array" with the right +array metadata (shape, strides and datatype). This is necessary so that Numba +can identify the array's characteristics and compile the kernel with the +appropriate type declarations. + +In this case the metadata can simply be got from the original Numpy array. +Note the GPU data isn't copied, just pointed to:: + + >>> from numba.cuda.cudadrv.devicearray import DeviceNDArray + >>> device_arr = DeviceNDArray(arr.shape, arr.strides, arr.dtype, gpu_data=cuda_buf.to_numba()) + +(ideally we could have defined an Arrow array in CPU memory, copied it to CUDA +memory without losing type information, and then invoked the Numba kernel on it +without constructing the DeviceNDArray by hand; this is not yet possible) + +Finally we can run the Numba CUDA kernel on the Numba device array (here +with a 16x16 grid size):: + + >>> increment_by_one[16, 16](device_arr) + +And the results can be checked by copying back the CUDA buffer to CPU memory:: + + >>> np.frombuffer(cuda_buf.copy_to_host(), dtype=np.int32) + array([1, 2, 3, 4], dtype=int32) + +Numba to Arrow +~~~~~~~~~~~~~~ + +Conversely, a Numba-created device array can be viewed as an Arrow CUDA buffer, +using the :meth:`CudaBuffer.from_numba` factory method. + +For the sake of example, let's first create a Numba device array:: + + >>> arr = np.arange(10, 14, dtype=np.int32) + >>> arr + array([10, 11, 12, 13], dtype=int32) + >>> device_arr = numba.cuda.to_device(arr) + +Then we can create a CUDA buffer pointing the device array's memory. +We don't need to pass a CUDA context explicitly this time: the appropriate +CUDA context is automatically retrieved and adapted from the Numba object. + +:: + + >>> cuda_buf = cuda.CudaBuffer.from_numba(device_arr.gpu_data) + >>> cuda_buf.size + 16 + >>> cuda_buf.address + 30088364032 + >>> cuda_buf.context.device_number + 0 + +Of course, we can copy the CUDA buffer back to host memory:: + + >>> np.frombuffer(cuda_buf.copy_to_host(), dtype=np.int32) + array([10, 11, 12, 13], dtype=int32) + +.. seealso:: + Documentation for Numba's `CUDA support <https://numba.pydata.org/numba-doc/latest/cuda/index.html>`_. diff --git a/docs/source/python/index.rst b/docs/source/python/index.rst index fe04a73..9f96771 100644 --- a/docs/source/python/index.rst +++ b/docs/source/python/index.rst @@ -43,6 +43,7 @@ files into Arrow structures. pandas csv parquet + cuda extending api development diff --git a/docs/source/python/memory.rst b/docs/source/python/memory.rst index 0d30866..ba66807 100644 --- a/docs/source/python/memory.rst +++ b/docs/source/python/memory.rst @@ -109,6 +109,9 @@ the buffer is garbaged-collected, all of the memory is freed: buf = None pa.total_allocated_bytes() +.. seealso:: + On-GPU buffers using Arrow's optional :doc:`CUDA integration <cuda>`. + Input and Output ================