Re: Apache Arrow file format

Adam Lippai Tue, 17 Oct 2023 19:50:53 -0700

Also there is
https://github.com/lancedb/lance between the two formats. Depending on the
use case it can be a great choice.


Best regards
Adam Lippai

On Tue, Oct 17, 2023 at 22:44 Matt Topol <[email protected]> wrote:

> One benefit of the feather format (i.e. Arrow IPC file format) is the
> ability to mmap the file to easily handle reading sections of a larger than
> memory file of data. Since, as Felipe mentioned, the format is focused on
> in-memory representation, you can easily and simply mmap the file and use
> the raw bytes directly. For a large file that you only want to read
> sections of, this can be beneficial for IO and memory usage.
>
> Unfortunately, you are correct that it doesn't allow for easy column
> projecting (you're going to read all the columns for a record batch in the
> file, no matter what). So it's going to be a trade off based on your needs
> as to whether it makes sense, or if you should use a file format like
> Parquet instead.
>
> -Matt
>
>
> On Tue, Oct 17, 2023, 10:31 PM Felipe Oliveira Carvalho <
> [email protected]>
> wrote:
>
> > It’s not the best since the format is really focused on in- memory
> > representation and direct computation, but you can do it:
> >
> > https://arrow.apache.org/docs/python/feather.html
> >
> > —
> > Felipe
> >
> > On Tue, 17 Oct 2023 at 23:26 Nara <[email protected]>
> wrote:
> >
> > > Hi,
> > >
> > > Is it a good idea to use Apache Arrow as a file format? Looks like
> > > projecting columns isn't available by default.
> > >
> > > One of the benefits of Parquet file format is column projection, where
> > the
> > > IO is limited to just the columns projected.
> > >
> > > Regards ,
> > > Nara
> > >
> >
>

Re: Apache Arrow file format

Reply via email to