Hello Kohei,

You can create a arrow::BufferReader to wrap your in-memory buffer:
https://arrow.apache.org/docs/cpp/api/io.html#in-memory-streams

and then pass it to parquet::FileReaderBuilder:
https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N7parquet5arrow17FileReaderBuilderE

(BufferReader subclasses RandomAccessFile)

Regards

Antoine.


Le 19/06/2023 à 14:57, Kohei Yoshida a écrit :
Hello there,

I would like to get some guidance on how to load Parquet files from an
in-memory buffers.  I have already managed to load from files by
following this tutorial:

https://arrow.apache.org/docs/cpp/parquet.html

and I did spend some time looking around the Arrow API to figure out a
way to load from in-memory buffers.  But so far no luck.

Is there a way to achieve this using the existing Arrow API?  Any help
or guidance would be appreciated.

A little background on why I'm doing this.  I'm currently working on
implementing an import filter for Parquet file format for LibreOffice
Calc, and I'm doing so via orcus library[1] which specializes in
providing spreadsheet-related file format filters as an external
library.  The orcus library API itself provides API for both loading
from files and loading from in-memory buffers for all file formats it
supports.  LibreOffice itself uses orcus's in-memory buffer API to
achieve file loading due to the way its file loading mechanism works.
Currently, I'm temporarily saving the incoming buffer to a temporary
file and loading from it, but that's far from ideal...

Thanks,

Kohei Yoshida

[1] https://gitlab.com/orcus/orcus

Reply via email to