This is about the C++ api We are changing our underlying storage to be based on Parquet files instead of using a proprietary format that we developed. Arrows integration with Parquet makes it attractive for leveraging it as our cache layer but I am having trouble finding much documentation on reading files from Parquet into Arrow using the c++ api and the examples are somewhat limited.
Currently our own memory manager handles things like expiring data when it is stale or goes above a threshhold and has a tightly integrated API with our storage layer. I.E. you can request stuff from it even if it has not been loaded yet and the Cache layer will get that data directly from disk. Do any utilities exist in arrow for managing memory consumption and releasing information from cache as its consumption increases? Are there ways of detecting when the last time some information was accessed? If I wanted a cache layer that could leverage arrow but still be able to access data directly from parquet when it was not loaded into Arrow is the best way to have some kind of manager that will load the data into Arrow when it is not available? Or is there some kind of API where Arrow can know, if they request this data I need to load it from this parquet file? Are there any docs available for the c++ apis of Arrow and Parquet other than what is found at https://arrow.apache.org/docs/cpp/ https://github.com/apache/parquet-cpp Felipe Aramburu ᐧ
