Here is an interesting discussion to share data in columnar storage between two applications. https://github.com/apache/spark/pull/15219#issuecomment-265835049
One of the ideas is to prepare interfaces (or trait) only for read or write. Each application can implement only one class to want to do (e.g. read or write). For example, FiloDB wants to provide a columnar storage that can be read from Spark. In that case, it is easy to implement only read APIs for Spark. These two classes can be prepared. However, it may lead to incompatibility in ColumnarBatch. ColumnarBatch keeps a set of ColumnVector that can be read or written. The ColumnVector class should have read and write APIs. How can we put the new ColumnVector with only read APIs? Here is an example to case incompatibility at https://gist.github.com/kiszk/00ab7d0c69f0e598e383cdc8e72bcc4d Another possible idea is that both applications supports Apache Arrow APIs. Other approaches could be. What approach would be good for all of applications? Regards, Kazuaki Ishizaki