This is about the C++ api
We are changing our underlying storage to be based on Parquet files instead
of using a proprietary format that we developed. Arrows integration with
Parquet makes it attractive for leveraging it as our cache layer but I am
having trouble finding much documentation on reading files from Parquet
into Arrow using the c++ api  and the examples are somewhat limited.

Currently our own memory manager handles things like expiring data when it
is stale or goes above a threshhold and has a tightly integrated API with
our storage layer. I.E. you can request stuff from it even if it has not
been loaded yet and the Cache layer will get that data directly from disk.

Do any utilities exist in arrow for managing memory consumption and
releasing information from cache as its consumption increases? Are there
ways of detecting when the last time some information was accessed?

If I wanted a cache layer that could leverage arrow but still be able to
access data directly from parquet when it was not loaded into Arrow is the
best way to have some kind of manager that will load the data into Arrow
when it is not available? Or is there some kind of API where Arrow can
know, if they request this data I need to load it from this parquet file?

Are there any docs available for the c++ apis of Arrow and Parquet other
than what is found at

https://arrow.apache.org/docs/cpp/
https://github.com/apache/parquet-cpp


Felipe Aramburu
ᐧ

Reply via email to