> as far as we can tell, this filesystem layer > is unaware of expressions, record batches, etc
You're correct that the filesystem layer doesn't directly support Expressions. However the datasets API includes the Partitioning classes which embed expressions in paths. Depending on what expressions etc you need to embed, you could implement a RadosFileSystem class which wraps an IO context and treat object names as paths. If the RADOS objects contain arrow formatted data, then a FileSystemDataset (using IpcFileFormat) can be constructed which views the IO context and exploits the partitioning information embedded in object names to accelerate filtering. Does that accommodate your use case? > Our main concern is that this new arrow::dataset::RadosFormat class will be > deriving from the arrow::dataset::FileFormat class, which seems to raise a > conceptual mismatch as there isn’t really a RADOS format IIUC RADOS doesn't interact with a filesystem directly, so RadosFileFormat would indeed be a conceptually problematic point of extension. If a RADOS file system is not viable then I think the ideal approach would be to directly implement the Fragment [1] and Dataset [2] interfaces, forgoing a FileFormat implementation altogether. Unfortunately the only example we have of this approach is InMemoryFragment, which simply wraps a vector of record batches. [1] https://github.com/apache/arrow/blob/975f4eb/cpp/src/arrow/dataset/dataset.h#L45-L90 [2] https://github.com/apache/arrow/blob/975f4eb/cpp/src/arrow/dataset/dataset.h#L119-L158 On Fri, Aug 28, 2020 at 1:27 PM Ivo Jimenez <ivo.jime...@gmail.com> wrote: > Hi Antoine > > > Yes, that is our plan. Since this is going to be done on the storage-, > > > server-side, this would be transparent to the client. So our main > concern > > > is whether this be OK from the design perspective, and could this > > > eventually be merged upstream? > > > > Arrow datasets have no notion of client and server, so I'm not sure what > > you mean here. > > > Sorry for the confusion. This is where we see a mismatch between the > current design and what we are trying to achieve. > > Our goal is to push down computations in a cloud storage system. By pushing > we mean actually sending computation tasks to storage nodes (e.g. filtering > executing on storage nodes). Ideally this would be done by implementing a > new plugin for arrow::fs but as far as we can tell, this filesystem layer > is unaware of expressions, record batches, etc. so this information cannot > be communicated down to storage. > > So what we thought would work is to implement this at the Dataset API > level, and implement a scanner (and writer) that would be deferring these > operations to storage nodes. For example, the RadosScanTask class will ask > a storage node to actually do a scan and fetch the result, as opposed to do > the scan locally. > > We would immensely appreciate it if you could let us know if the above is > OK, or if you think there is a better alternative for accomplishing this, > as we would rather implement this functionality in a way that is > compatible with your overall vision. > > > > Do you simply mean contributing RadosFormat to the Arrow > > codebase? > > > Yes, so that others wanting to achieve this on a Ceph cluster could > leverage this as well. > > > > I would say that depends on the required dependencies, and > > ease of testing (and/or CI) for other developers. > > > OK, yes we will pay attention to these aspects as part of an eventual PR. > We will include tests and ensure that CI covers the changes we introduce. > > thanks! >