Re: Exploring the possibility of creating a persistent cache by arrow/plasma

Robert Nishihara Sun, 21 Jan 2018 16:22:37 -0800

Evicted objects are gone for good, although it would certainly be possible
to add the ability to persist them to disk.


The Plasma store does reference counting to figure out which clients are
using which objects. Clients can "release" objects through the client API
to decrement the reference count. The Plasma store also keeps track of when
a client exits/dies and automatically gets rid of the reference counts for
that client.

On Sun, Jan 21, 2018 at 4:09 PM Mike Sam <[email protected]> wrote:

> Great, thank you very much.
>
> What happens to the evicted objects? are they
> gone for good or are they persisted locally?
>
> Also, what defines "objects that are not currently in use by any client"?
> reference counting?
>
>
>
> On Sat, Jan 20, 2018 at 1:53 PM, Robert Nishihara <
> [email protected]
> > wrote:
>
> > When Plasma is started up, you specify the total amount of memory it is
> > allowed to use (in bytes) with the -m flag.
> >
> > When a Plasma client attempts to create a new object and there is not
> > enough memory in the store, the store will evict a bunch of unused
> objects
> > to free up memory (objects that are not currently in use by any client).
> > This is done in a least-recently-used fashion as defined in the eviction
> > policy
> > https://github.com/apache/arrow/blob/master/cpp/src/
> > plasma/eviction_policy.h.
> > In principle, this eviction policy could be made more configurable or a
> > different eviction policy could be plugged in, though we haven't
> > experimented with that much.
> >
> > If you want to manually delete an object from Plasma, that can be done
> with
> > the "Delete" command
> > https://github.com/apache/arrow/blob/d135974a0d3dd9a9fbbb10da4c5dbc
> > 65f9324234/cpp/src/plasma/client.h#L186,
> > which is part of the C++ Plasma client API but has not been exposed
> through
> > Python yet.
> >
> > For now, if you want to make sure that an object will not be evicted
> (e.g.,
> > from the C++ Client API), you can call Get on the object ID and then it
> > will not be evicted before you call Release from the same client.
> >
> > On Fri, Jan 19, 2018 at 5:17 PM Mike Sam <[email protected]> wrote:
> >
> > > Thank you, Robert, for your answer.
> > >
> > > Could you kindly further elaborate on number 1 as I am not
> > > familiar with Plasma codebase yet?
> > > Are you saying persistence is available out of the box? else what
> > > specific things need to be added
> > > to Plasma codebase to make this happen?
> > >
> > > Thank you,
> > > Mike
> > >
> > >
> > >
> > > On Thu, Jan 18, 2018 at 11:43 PM, Robert Nishihara <
> > > [email protected]> wrote:
> > >
> > > > Hi Mike,
> > > >
> > > > 1. I think yes, though we'd need to turn off the automatic LRU
> eviction
> > > > that happens when the store fills up.
> > > >
> > > > 3. I think there are some edge cases and it depends what is in your
> > > > DataFrame, but at least if it consists of numerical data then the two
> > > > representations should use the same underlying data in shared memory.
> > > >
> > > > On Thu, Jan 18, 2018 at 11:37 PM Mike Sam <[email protected]>
> > wrote:
> > > >
> > > > > I am interested to implement an arrow based persisted cache store
> > and I
> > > > > have a few related questions:
> > > > >
> > > > >    1.
> > > > >
> > > > >    Is it possible just to use Plasma for this goal?
> > > > >    (My understanding is that it is not persistable)
> > > > >    Else, what is the recommended way to do so?
> > > > >    2.
> > > > >
> > > > >    Is feather the better file format for persistence to avoid
> > > > >    re-transcoding hot chunks?
> > > > >    3.
> > > > >
> > > > >    When Pandas load data from plasma/arrow, is it doubling the
> memory
> > > > >    usage? (One for the arrow representation, one for pandas
> > > > representation)
> > > > >
> > > > > --
> > > > > Thanks,
> > > > > Mike
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Mike
> > >
> >
>
>
>
> --
> Thanks,
> Mike
>

Re: Exploring the possibility of creating a persistent cache by arrow/plasma

Reply via email to