Hey all, Is there much interest in adding the capability to do Arrow <=> Protobuf conversion in C++?
I'm working on this for a side project, but I was wondering if there is much interest from the broader Arrow community. If so, I might be able to find time to contribute it. To get the point across, here is a strawman API. In reality, we would likely need some sort of builder API which allows incrementally adding protos and a generator-like API for returning the protos from a table. """ // Functions of functions using templates to work with any message type template <class T> Result<std::shared_prt<Table>> ProtosToTable(const std::vector<T>& protos); template <class T> Result<std::vector<T>> TableToProtos(const std::shared_prt<Table> table); // Pair of functions using google::protobuf::Message and polymorphism to work with any message type Result<std::shared_prt<Table>> ProtosToTable( const std::vector<google::protobuf::Message *>& protos); // I don't like that this returns a vector of unique pointers. Is there a better way to return a vector of base classes while retaining polymorphic behavior? Result<std::vector<std::unique_ptr<google::protobuf::Message>>> TableToProtos (const std::shared_prt<Table> table, const google::protobuf:Descriptor* descriptor); """ My particular use case for these functions is that I would like to use protobufs for the in-memory data representation as it provides strongly typed classes which are very expressive (can have nested/repeated fields) and a well established path for schema evolution. However, I would like to use parquet as the data storage layer (and arrow as the glue between the two) so that I can take advantage of technologies like presto for querying the data. I'm hoping that backwards compatible changes to the proto schema turn into backwards compatible changes in the parquet files. I'm also a bit curious to see if arrow allows faster deserialization when compared to a list of serialized protos on disk.