Hi Tewfik,

It would be good to step back a bit and explain what your data is, and
what the consumer is going to do with it.

Regards

Antoine.


On Fri, 14 Feb 2020 15:08:57 -0800
Tewfik Zeghmi <zeg...@gmail.com> wrote:
> Hi Micah,
> 
> The primary language is Python.  I'm hoping the that the small overhead of
> metadata is small compared to the schema information.
> 
> thank you!
> 
> On Fri, Feb 14, 2020 at 3:07 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
> 
> > Hi Tewfik,
> > What language?  it is possible to serialize them separately but the right
> > hooks might not be exposed in all languages.
> >
> > There is still going to be a higher overhead for single row values in Arrow
> > compared to Avro due to metadata requirements.
> >
> > Thanks,
> > Micah
> >
> > On Fri, Feb 14, 2020 at 1:33 PM Tewfik Zeghmi <zeg...@gmail.com> wrote:
> >  
> > > Hi,
> > >
> > > I have a use case of creating a feature store to serve low latency  
> > traffic.  
> > > Given a key, we need the ability to save and read a feature vector in a  
> > low  
> > > latency Key Value store. Serializing an Arrow table with one row is takes
> > > 1344 bytes, while the same singular row serialized with AVRO without the
> > > schema uses 236 bytes.
> > >
> > > Is it possible to save serialize an Arrow table/RecordBatch independently
> > > of the schema? Ideally, we'd like to serialize the schema once and not
> > > along with every feature key, then be able to read the RecordBatch with  
> > the  
> > > schema.
> > >
> > > thank you!
> > >  
> >  
> 
> 



Reply via email to