Note that for the Python bindings, the reference counting is done automatically, see
https://github.com/apache/arrow/blob/master/python/pyarrow/plasma.pyx#L182 which is e.g. used as the base object for numpy arrays whose memory is backed by the object store. On Sun, Jan 21, 2018 at 4:21 PM, Robert Nishihara <robertnishih...@gmail.com > wrote: > Evicted objects are gone for good, although it would certainly be possible > to add the ability to persist them to disk. > > The Plasma store does reference counting to figure out which clients are > using which objects. Clients can "release" objects through the client API > to decrement the reference count. The Plasma store also keeps track of when > a client exits/dies and automatically gets rid of the reference counts for > that client. > > On Sun, Jan 21, 2018 at 4:09 PM Mike Sam <mikesam...@gmail.com> wrote: > > > Great, thank you very much. > > > > What happens to the evicted objects? are they > > gone for good or are they persisted locally? > > > > Also, what defines "objects that are not currently in use by any client"? > > reference counting? > > > > > > > > On Sat, Jan 20, 2018 at 1:53 PM, Robert Nishihara < > > robertnishih...@gmail.com > > > wrote: > > > > > When Plasma is started up, you specify the total amount of memory it is > > > allowed to use (in bytes) with the -m flag. > > > > > > When a Plasma client attempts to create a new object and there is not > > > enough memory in the store, the store will evict a bunch of unused > > objects > > > to free up memory (objects that are not currently in use by any > client). > > > This is done in a least-recently-used fashion as defined in the > eviction > > > policy > > > https://github.com/apache/arrow/blob/master/cpp/src/ > > > plasma/eviction_policy.h. > > > In principle, this eviction policy could be made more configurable or a > > > different eviction policy could be plugged in, though we haven't > > > experimented with that much. > > > > > > If you want to manually delete an object from Plasma, that can be done > > with > > > the "Delete" command > > > https://github.com/apache/arrow/blob/d135974a0d3dd9a9fbbb10da4c5dbc > > > 65f9324234/cpp/src/plasma/client.h#L186, > > > which is part of the C++ Plasma client API but has not been exposed > > through > > > Python yet. > > > > > > For now, if you want to make sure that an object will not be evicted > > (e.g., > > > from the C++ Client API), you can call Get on the object ID and then it > > > will not be evicted before you call Release from the same client. > > > > > > On Fri, Jan 19, 2018 at 5:17 PM Mike Sam <mikesam...@gmail.com> wrote: > > > > > > > Thank you, Robert, for your answer. > > > > > > > > Could you kindly further elaborate on number 1 as I am not > > > > familiar with Plasma codebase yet? > > > > Are you saying persistence is available out of the box? else what > > > > specific things need to be added > > > > to Plasma codebase to make this happen? > > > > > > > > Thank you, > > > > Mike > > > > > > > > > > > > > > > > On Thu, Jan 18, 2018 at 11:43 PM, Robert Nishihara < > > > > robertnishih...@gmail.com> wrote: > > > > > > > > > Hi Mike, > > > > > > > > > > 1. I think yes, though we'd need to turn off the automatic LRU > > eviction > > > > > that happens when the store fills up. > > > > > > > > > > 3. I think there are some edge cases and it depends what is in your > > > > > DataFrame, but at least if it consists of numerical data then the > two > > > > > representations should use the same underlying data in shared > memory. > > > > > > > > > > On Thu, Jan 18, 2018 at 11:37 PM Mike Sam <mikesam...@gmail.com> > > > wrote: > > > > > > > > > > > I am interested to implement an arrow based persisted cache store > > > and I > > > > > > have a few related questions: > > > > > > > > > > > > 1. > > > > > > > > > > > > Is it possible just to use Plasma for this goal? > > > > > > (My understanding is that it is not persistable) > > > > > > Else, what is the recommended way to do so? > > > > > > 2. > > > > > > > > > > > > Is feather the better file format for persistence to avoid > > > > > > re-transcoding hot chunks? > > > > > > 3. > > > > > > > > > > > > When Pandas load data from plasma/arrow, is it doubling the > > memory > > > > > > usage? (One for the arrow representation, one for pandas > > > > > representation) > > > > > > > > > > > > -- > > > > > > Thanks, > > > > > > Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Thanks, > > > > Mike > > > > > > > > > > > > > > > -- > > Thanks, > > Mike > > >