ivotron opened a new pull request #8647:
URL: https://github.com/apache/arrow/pull/8647


   This PR contains a basic implementation of the Dataset API on Ceph that uses 
the librados C++ API to defer evaluation of expressions to a RADOS storage 
backend. The storage-side code is included, as well as unit and integration 
tests.
   
   The Dataset implementation on RADOS is done by adding new RadosDataset and 
RadosFragment classes. A scanning operation triggers the evaluation of 
expressions on the storage-side. The PR includes a wrapper for the librados 
library, as well as a mock, that allows to run unit tests without having a Ceph 
instance. Integration tests have been modified in order to install Ceph and run 
without the mocks (running tests against a single-node Ceph "cluster").
   
   The storage-side code is implemented as a RADOS CLS (object storage class) 
using Ceph's [RADOS 
SDK](https://docs.ceph.com/en/octopus/architecture/#extending-ceph). The code 
lives in `cpp/src/arrow/adapters/arrow-rados-cls/`, and is expected to be 
deployed on the storage nodes (Ceph's OSDs) prior to operating on tables 
through the RadosDataset implementation. This PR includes a cmake configuration 
for building this library if desired (`ARROW_CLS` cmake option).
   
   Follow up work includes: dataset discovery, python bindings, large fragment 
stripping, IPC improvements on the backend, and a python library for writing 
tables.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to