On Wed, Jan 22, 2020 at 12:28 PM Shazz <sh...@metaverse.fr> wrote: > > Thanks Wes, > > I will follow what is happening between Arrow and Kudu. > In the short term, if you would have to define a storage for Arrow which > has good (enough) performance, not too costly to operate... what would > you choose ? I saw there is an example to store Parquet files on Azure > Blob Storage, would it be ok to start ? Or there is a better choice ?
Many people are doing that. Note that you'll need to do some tuning (e.g. read buffering) to obtain acceptable performance against things like ABS > --- > sh...@metaverse.fr > GPG public key ID : B517C4C8 > > Le 21/01/2020 17:54, Wes McKinney a écrit : > > I'm interested to see an Arrow adapter for Apache Kudu developed. My > > gut feeling is that this work should be undertaken in Kudu itself, > > potentially having the tablet servers producing Arrow Record Batches > > locally and sending them to the client rather than converting to > > Kudu's own on-the-wire record format and then deserializing into Arrow > > on the receiver side. It might be worth a conversation with the Kudu > > community to see what they think. > > > > Of course one can build an Arrow deserializer for the current Kudu C++ > > client API and probably get pretty good performance. see also > > ARROW-814 > > > > https://issues.apache.org/jira/browse/ARROW-814 > > > > On Tue, Jan 21, 2020 at 12:32 PM Shazz <sh...@metaverse.fr> wrote: > >> > >> Hi, > >> > >> I'm thinking of an architecture to store and access efficiently > >> tabular > >> data and I was told to look at Arrow and Kudu. > >> I saw on the frontpage a diagram where Arrow can be integrated with > >> Kudu > >> but nothing in the documentation. Is there an example available > >> somewhere ? > >> > >> Thanks ! > >> > >> -- > >> sh...@metaverse.fr > >> GPG public key ID : B517C4C8