Also there is https://github.com/lancedb/lance between the two formats. Depending on the use case it can be a great choice.
Best regards Adam Lippai On Tue, Oct 17, 2023 at 22:44 Matt Topol <zotthewiz...@gmail.com> wrote: > One benefit of the feather format (i.e. Arrow IPC file format) is the > ability to mmap the file to easily handle reading sections of a larger than > memory file of data. Since, as Felipe mentioned, the format is focused on > in-memory representation, you can easily and simply mmap the file and use > the raw bytes directly. For a large file that you only want to read > sections of, this can be beneficial for IO and memory usage. > > Unfortunately, you are correct that it doesn't allow for easy column > projecting (you're going to read all the columns for a record batch in the > file, no matter what). So it's going to be a trade off based on your needs > as to whether it makes sense, or if you should use a file format like > Parquet instead. > > -Matt > > > On Tue, Oct 17, 2023, 10:31 PM Felipe Oliveira Carvalho < > felipe...@gmail.com> > wrote: > > > It’s not the best since the format is really focused on in- memory > > representation and direct computation, but you can do it: > > > > https://arrow.apache.org/docs/python/feather.html > > > > — > > Felipe > > > > On Tue, 17 Oct 2023 at 23:26 Nara <narayanan.arunacha...@gmail.com> > wrote: > > > > > Hi, > > > > > > Is it a good idea to use Apache Arrow as a file format? Looks like > > > projecting columns isn't available by default. > > > > > > One of the benefits of Parquet file format is column projection, where > > the > > > IO is limited to just the columns projected. > > > > > > Regards , > > > Nara > > > > > >