Re: Sharing data in columnar storage between two applications

Mark Hamstra Mon, 26 Dec 2016 06:12:34 -0800

Yes, this is part of Matei's current research, for which code is not yet
publicly available at all, much less in a form suitable for production use.


On Mon, Dec 26, 2016 at 2:29 AM, Evan Chan <vel...@gmail.com> wrote:

> Looks pretty interesting, but might take a while honestly.
>
> On Dec 25, 2016, at 5:24 PM, Mark Hamstra <m...@clearstorydata.com> wrote:
>
> NOt so much about between applications, rather multiple frameworks within
> an application, but still related: https://cs.stanford.
> edu/~matei/papers/2017/cidr_weld.pdf
>
> On Sun, Dec 25, 2016 at 8:12 PM, Kazuaki Ishizaki <ishiz...@jp.ibm.com>
> wrote:
>
>> Here is an interesting discussion to share data in columnar storage
>> between two applications.
>> https://github.com/apache/spark/pull/15219#issuecomment-265835049
>>
>> One of the ideas is to prepare interfaces (or trait) only for read or
>> write. Each application can implement only one class to want to do (e.g.
>> read or write). For example, FiloDB wants to provide a columnar storage
>> that can be read from Spark. In that case, it is easy to implement only
>> read APIs for Spark. These two classes can be prepared.
>> However, it may lead to incompatibility in ColumnarBatch. ColumnarBatch
>> keeps a set of ColumnVector that can be read or written. The ColumnVector
>> class should have read and write APIs. How can we put the new ColumnVector
>> with only read APIs?  Here is an example to case incompatibility at
>> https://gist.github.com/kiszk/00ab7d0c69f0e598e383cdc8e72bcc4d
>>
>> Another possible idea is that both applications supports Apache Arrow
>> APIs.
>> Other approaches could be.
>>
>> What approach would be good for all of applications?
>>
>> Regards,
>> Kazuaki Ishizaki
>>
>
>
>

Re: Sharing data in columnar storage between two applications

Reply via email to