Sharing data in columnar storage between two applications

Kazuaki Ishizaki Sun, 25 Dec 2016 17:14:37 -0800

Here is an interesting discussion to share data in columnar storage 
between two applications.
https://github.com/apache/spark/pull/15219#issuecomment-265835049


One of the ideas is to prepare interfaces (or trait) only for read or 
write. Each application can implement only one class to want to do (e.g. 
read or write). For example, FiloDB wants to provide a columnar storage 
that can be read from Spark. In that case, it is easy to implement only 
read APIs for Spark. These two classes can be prepared.
However, it may lead to incompatibility in ColumnarBatch. ColumnarBatch 
keeps a set of ColumnVector that can be read or written. The ColumnVector 
class should have read and write APIs. How can we put the new ColumnVector 
with only read APIs?  Here is an example to case incompatibility at 
https://gist.github.com/kiszk/00ab7d0c69f0e598e383cdc8e72bcc4d

Another possible idea is that both applications supports Apache Arrow 
APIs.
Other approaches could be.

What approach would be good for all of applications?

Regards,
Kazuaki Ishizaki

Sharing data in columnar storage between two applications

Reply via email to