One benefit of the feather format (i.e. Arrow IPC file format) is the ability to mmap the file to easily handle reading sections of a larger than memory file of data. Since, as Felipe mentioned, the format is focused on in-memory representation, you can easily and simply mmap the file and use the raw bytes directly. For a large file that you only want to read sections of, this can be beneficial for IO and memory usage.
Unfortunately, you are correct that it doesn't allow for easy column projecting (you're going to read all the columns for a record batch in the file, no matter what). So it's going to be a trade off based on your needs as to whether it makes sense, or if you should use a file format like Parquet instead. -Matt On Tue, Oct 17, 2023, 10:31 PM Felipe Oliveira Carvalho <felipe...@gmail.com> wrote: > It’s not the best since the format is really focused on in- memory > representation and direct computation, but you can do it: > > https://arrow.apache.org/docs/python/feather.html > > — > Felipe > > On Tue, 17 Oct 2023 at 23:26 Nara <narayanan.arunacha...@gmail.com> wrote: > > > Hi, > > > > Is it a good idea to use Apache Arrow as a file format? Looks like > > projecting columns isn't available by default. > > > > One of the benefits of Parquet file format is column projection, where > the > > IO is limited to just the columns projected. > > > > Regards , > > Nara > > >