Re: Sharing data in columnar storage between two applications

Mark Hamstra Sun, 25 Dec 2016 17:26:46 -0800

NOt so much about between applications, rather multiple frameworks within
an application, but still related:
https://cs.stanford.edu/~matei/papers/2017/cidr_weld.pdf


On Sun, Dec 25, 2016 at 8:12 PM, Kazuaki Ishizaki <ishiz...@jp.ibm.com>
wrote:

> Here is an interesting discussion to share data in columnar storage
> between two applications.
> https://github.com/apache/spark/pull/15219#issuecomment-265835049
>
> One of the ideas is to prepare interfaces (or trait) only for read or
> write. Each application can implement only one class to want to do (e.g.
> read or write). For example, FiloDB wants to provide a columnar storage
> that can be read from Spark. In that case, it is easy to implement only
> read APIs for Spark. These two classes can be prepared.
> However, it may lead to incompatibility in ColumnarBatch. ColumnarBatch
> keeps a set of ColumnVector that can be read or written. The ColumnVector
> class should have read and write APIs. How can we put the new ColumnVector
> with only read APIs?  Here is an example to case incompatibility at
> https://gist.github.com/kiszk/00ab7d0c69f0e598e383cdc8e72bcc4d
>
> Another possible idea is that both applications supports Apache Arrow APIs.
> Other approaches could be.
>
> What approach would be good for all of applications?
>
> Regards,
> Kazuaki Ishizaki
>

Re: Sharing data in columnar storage between two applications

Reply via email to