Hi Tewfik,
It would be good to step back a bit and explain what your data is, and what the consumer is going to do with it. Regards Antoine. On Fri, 14 Feb 2020 15:08:57 -0800 Tewfik Zeghmi <zeg...@gmail.com> wrote: > Hi Micah, > > The primary language is Python. I'm hoping the that the small overhead of > metadata is small compared to the schema information. > > thank you! > > On Fri, Feb 14, 2020 at 3:07 PM Micah Kornfield <emkornfi...@gmail.com> > wrote: > > > Hi Tewfik, > > What language? it is possible to serialize them separately but the right > > hooks might not be exposed in all languages. > > > > There is still going to be a higher overhead for single row values in Arrow > > compared to Avro due to metadata requirements. > > > > Thanks, > > Micah > > > > On Fri, Feb 14, 2020 at 1:33 PM Tewfik Zeghmi <zeg...@gmail.com> wrote: > > > > > Hi, > > > > > > I have a use case of creating a feature store to serve low latency > > traffic. > > > Given a key, we need the ability to save and read a feature vector in a > > low > > > latency Key Value store. Serializing an Arrow table with one row is takes > > > 1344 bytes, while the same singular row serialized with AVRO without the > > > schema uses 236 bytes. > > > > > > Is it possible to save serialize an Arrow table/RecordBatch independently > > > of the schema? Ideally, we'd like to serialize the schema once and not > > > along with every feature key, then be able to read the RecordBatch with > > the > > > schema. > > > > > > thank you! > > > > > > >