Re: Exploring the possibility of creating a persistent cache by arrow/plasma

Philipp Moritz Sun, 21 Jan 2018 16:27:37 -0800

Note that for the Python bindings, the reference counting is done
automatically, see


https://github.com/apache/arrow/blob/master/python/pyarrow/plasma.pyx#L182

which is e.g. used as the base object for numpy arrays whose memory is
backed by the object store.

On Sun, Jan 21, 2018 at 4:21 PM, Robert Nishihara <robertnishih...@gmail.com
> wrote:

> Evicted objects are gone for good, although it would certainly be possible
> to add the ability to persist them to disk.
>
> The Plasma store does reference counting to figure out which clients are
> using which objects. Clients can "release" objects through the client API
> to decrement the reference count. The Plasma store also keeps track of when
> a client exits/dies and automatically gets rid of the reference counts for
> that client.
>
> On Sun, Jan 21, 2018 at 4:09 PM Mike Sam <mikesam...@gmail.com> wrote:
>
> > Great, thank you very much.
> >
> > What happens to the evicted objects? are they
> > gone for good or are they persisted locally?
> >
> > Also, what defines "objects that are not currently in use by any client"?
> > reference counting?
> >
> >
> >
> > On Sat, Jan 20, 2018 at 1:53 PM, Robert Nishihara <
> > robertnishih...@gmail.com
> > > wrote:
> >
> > > When Plasma is started up, you specify the total amount of memory it is
> > > allowed to use (in bytes) with the -m flag.
> > >
> > > When a Plasma client attempts to create a new object and there is not
> > > enough memory in the store, the store will evict a bunch of unused
> > objects
> > > to free up memory (objects that are not currently in use by any
> client).
> > > This is done in a least-recently-used fashion as defined in the
> eviction
> > > policy
> > > https://github.com/apache/arrow/blob/master/cpp/src/
> > > plasma/eviction_policy.h.
> > > In principle, this eviction policy could be made more configurable or a
> > > different eviction policy could be plugged in, though we haven't
> > > experimented with that much.
> > >
> > > If you want to manually delete an object from Plasma, that can be done
> > with
> > > the "Delete" command
> > > https://github.com/apache/arrow/blob/d135974a0d3dd9a9fbbb10da4c5dbc
> > > 65f9324234/cpp/src/plasma/client.h#L186,
> > > which is part of the C++ Plasma client API but has not been exposed
> > through
> > > Python yet.
> > >
> > > For now, if you want to make sure that an object will not be evicted
> > (e.g.,
> > > from the C++ Client API), you can call Get on the object ID and then it
> > > will not be evicted before you call Release from the same client.
> > >
> > > On Fri, Jan 19, 2018 at 5:17 PM Mike Sam <mikesam...@gmail.com> wrote:
> > >
> > > > Thank you, Robert, for your answer.
> > > >
> > > > Could you kindly further elaborate on number 1 as I am not
> > > > familiar with Plasma codebase yet?
> > > > Are you saying persistence is available out of the box? else what
> > > > specific things need to be added
> > > > to Plasma codebase to make this happen?
> > > >
> > > > Thank you,
> > > > Mike
> > > >
> > > >
> > > >
> > > > On Thu, Jan 18, 2018 at 11:43 PM, Robert Nishihara <
> > > > robertnishih...@gmail.com> wrote:
> > > >
> > > > > Hi Mike,
> > > > >
> > > > > 1. I think yes, though we'd need to turn off the automatic LRU
> > eviction
> > > > > that happens when the store fills up.
> > > > >
> > > > > 3. I think there are some edge cases and it depends what is in your
> > > > > DataFrame, but at least if it consists of numerical data then the
> two
> > > > > representations should use the same underlying data in shared
> memory.
> > > > >
> > > > > On Thu, Jan 18, 2018 at 11:37 PM Mike Sam <mikesam...@gmail.com>
> > > wrote:
> > > > >
> > > > > > I am interested to implement an arrow based persisted cache store
> > > and I
> > > > > > have a few related questions:
> > > > > >
> > > > > >    1.
> > > > > >
> > > > > >    Is it possible just to use Plasma for this goal?
> > > > > >    (My understanding is that it is not persistable)
> > > > > >    Else, what is the recommended way to do so?
> > > > > >    2.
> > > > > >
> > > > > >    Is feather the better file format for persistence to avoid
> > > > > >    re-transcoding hot chunks?
> > > > > >    3.
> > > > > >
> > > > > >    When Pandas load data from plasma/arrow, is it doubling the
> > memory
> > > > > >    usage? (One for the arrow representation, one for pandas
> > > > > representation)
> > > > > >
> > > > > > --
> > > > > > Thanks,
> > > > > > Mike
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks,
> > > > Mike
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> > Mike
> >
>

Re: Exploring the possibility of creating a persistent cache by arrow/plasma

Reply via email to