Re: new to Arrow / integration with Kudu

Wes McKinney Wed, 22 Jan 2020 11:32:18 -0800

On Wed, Jan 22, 2020 at 12:28 PM Shazz <[email protected]> wrote:
>
> Thanks Wes,
>
> I will follow what is happening between Arrow and Kudu.
> In the short term, if you would have to define a storage for Arrow which
> has good (enough) performance, not too costly to operate... what would
> you choose ? I saw there is an example to store Parquet files on Azure
> Blob Storage, would it be ok to start ? Or there is a better choice ?


Many people are doing that. Note that you'll need to do some tuning
(e.g. read buffering) to obtain acceptable performance against things
like ABS

> ---
> [email protected]
> GPG public key ID : B517C4C8
>
> Le 21/01/2020 17:54, Wes McKinney a écrit :
> > I'm interested to see an Arrow adapter for Apache Kudu developed. My
> > gut feeling is that this work should be undertaken in Kudu itself,
> > potentially having the tablet servers producing Arrow Record Batches
> > locally and sending them to the client rather than converting to
> > Kudu's own on-the-wire record format and then deserializing into Arrow
> > on the receiver side. It might be worth a conversation with the Kudu
> > community to see what they think.
> >
> > Of course one can build an Arrow deserializer for the current Kudu C++
> > client API and probably get pretty good performance. see also
> > ARROW-814
> >
> > https://issues.apache.org/jira/browse/ARROW-814
> >
> > On Tue, Jan 21, 2020 at 12:32 PM Shazz <[email protected]> wrote:
> >>
> >> Hi,
> >>
> >> I'm thinking of an architecture to store and access efficiently
> >> tabular
> >> data and I was told to look at Arrow and Kudu.
> >> I saw on the frontpage a diagram where Arrow can be integrated with
> >> Kudu
> >> but nothing in the documentation. Is there an example available
> >> somewhere ?
> >>
> >> Thanks !
> >>
> >> --
> >> [email protected]
> >> GPG public key ID : B517C4C8

Re: new to Arrow / integration with Kudu

Reply via email to