Anurag Khandelwal created ARROW-4294:
----------------------------------------
Summary: [Plasma] Add support for evicting objects to external
store
Key: ARROW-4294
URL: https://issues.apache.org/jira/browse/ARROW-4294
Project: Apache Arrow
Issue Type: New Feature
Components: C++
Affects Versions: 0.11.1
Reporter: Anurag Khandelwal
Fix For: 0.13.0
Currently, when Plasma needs storage space for additional objects, it evicts
objects by deleting them from the Plasma store. This is a problem when it isn't
possible to reconstruct the object or reconstructing it is expensive. Adding
support for a pluggable external store that Plasma can evict objects to will
address this issue.
My proposal is described below.
*Requirements*
* Objects in Plasma should be evicted to a external store rather than being
removed altogether
* Communication to the external storage service should be through a very thin,
shim interface. At the same time, the interface should be general enough to
support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.)
* Should be pluggable (e.g., it should be simple to add in or remove the
external storage service for eviction, switch between different remote
services, etc.) and easy to implement
*Assumptions/Non-Requirements*
* The external store has practically infinite storage
* The external store's write operation is idempotent and atomic; this is
needed ensure there are no race conditions due to multiple concurrent evictions
of the same object.
*Proposed Implementation*
* Define a ExternalStore interface with a Connect call. The call returns an
ExternalStoreHandle, that exposes Put and Get calls. Any external store that
needs to be supported has to have this interface implemented.
* In order to read or write data to the external store in a thread-safe
manner, one ExternalStoreHandle should be created per-thread. While the
ExternalStoreHandle itself is not required to be thread-safe, multiple
ExternalStoreHandles across multiple threads should be able to modify the
external store in a thread-safe manner.
* Replace the DeleteObjects method in the Plasma Store with an EvictObjects
method. If an external store is specified for the Plasma store, the
EvictObjects method would mark the object state as PLASMA_EVICTED, write the
object data to the external store (via the ExternalStoreHandle) and reclaim the
memory associated with the object data/metadata rather than remove the entry
from the Object Table altogether. In case there is no valid external store, the
eviction path would remain the same (i.e., the object entry is still deleted
from the Object Table).
* The Get method in Plasma Store now tries to fetch the object from external
store if it is not found locally and there is an external store associated with
the Plasma Store. The method tries to offload this to an external worker thread
pool with a fire-and-forget model, but may need to do this synchronously if
there are too many requests already enqueued.
* *The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES, which
can be appended to with implementations of the ExternalStore and
ExternalStoreHandle interfaces, which will then be compiled into the
plasma_store_server executable.*
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)