Re: [DISCUSS] Apache Arrow and the GPU Open Analytics Initiative

Robert Nishihara Wed, 16 Aug 2017 14:10:00 -0700

That makes a lot of sense. In some contexts it could make sense to run
multiple Plasma stores per machine (possibly for different devices or
different NUMA zones). Though that could make it slightly harder to take
advantage of faster GPU to GPU communication.


On Wed, Aug 16, 2017 at 2:01 PM Philipp Moritz <pcmor...@gmail.com> wrote:

> One observation here is that as far as I know shared memory is not
> typically used between multiple gpus and on a single gpu there is already a
> unified shared address space that each cuda thread can access.
>
> One reasonable extension of the APIs and facilities given these limitations
> would be the following:
>
> 1.) Extend plasma::Create to take an optional flag (CPU/HOST/SHARED, GPU0,
> GPU1, etc.) which allocates the object on the desired device (host shared
> memory,  gpu 0, gpu 1, etc.)
>
> 2.) Extend plasma::Get to take the same flag and will transparently copy
> the data to the desired device as neccessary and return a pointer that is
> valid on the specified device.
>
> 3.) Extend the status and notification APIs to account for these changes
> and also the object lifetime tracking.
>
> I wonder if people would find that useful, let me know about your thoughts!
> Ideally we would also have some integration into say TensorFlow or other
> deep learning frameworks that can make use of these capabilities (the way
> we typically use gpus in Ray at the moment is mostly through TensorFlow by
> feeding data through placeholders, which has some performance bottlenecks
> but so far we mostly managed to work around them).
>
>
>
> On Wed, Aug 16, 2017 at 1:01 PM, Wes McKinney <wesmck...@gmail.com> wrote:
>
> > One idea is whether the Plasma object store could be extended to
> > support devices other than POSIX shared memory, like GPU device memory
> > (or multiple GPUs on a single host).
> >
> > Philipp or Robert or any of the people who know the Plasma code best,
> > any idea how this might be approached? It would have to be developed
> > as an optional extension so that users without e.g. a CUDA
> > installation don't have to bother with nvcc (which is proprietary) or
> > the CUDA runtime libraries.
> >
> > - Wes
> >
> > On Mon, Aug 7, 2017 at 2:15 PM, Wes McKinney <wesmck...@gmail.com>
> wrote:
> > > hi all,
> > >
> > > A group of companies have created a project called the GPU Open
> > > Analytics Initiative (GOAI), with the purpose of creating open source
> > > software and specifications for analytics on GPU.
> > >
> > > So far, they have focused on building a "GPU Data Frame", which is
> > > effectively putting Arrow data on the GPU:
> > >
> > > https://github.com/gpuopenanalytics/libgdf/wiki/Technical-Overview
> > > http://gpuopenanalytics.com/
> > >
> > > Shared memory IPC and analytics on Arrow data beyond the CPU are
> > > definitely in scope for the Arrow project, so we should look for ways
> > > to collaborate and help each other. I am sure this will not be the
> > > last time that someone needs to use Arrow memory with GPUs, so it
> > > would be useful for the community to develop memory management and
> > > utility code to assist with using Arrow in a mixed-device setting.
> > >
> > > I am not sure how to best proceed but wanted to make everyone aware of
> > > GOAI and look for opportunities to grow the Arrow community.
> > >
> > > Thanks,
> > > Wes
> >
>

Re: [DISCUSS] Apache Arrow and the GPU Open Analytics Initiative

Reply via email to