Hey all,

Is there much interest in adding the capability to do Arrow <=> Protobuf
conversion in C++?

I'm working on this for a side project, but I was wondering if there is
much interest from the broader Arrow community. If so, I might be able to
find time to contribute it.

To get the point across, here is a strawman API. In reality, we would
likely need some sort of builder API which allows incrementally adding
protos and a generator-like API for returning the protos from a table.

"""
// Functions of functions using templates to work with any message type
template <class T>
Result<std::shared_prt<Table>> ProtosToTable(const std::vector<T>& protos);

template <class T>
Result<std::vector<T>> TableToProtos(const std::shared_prt<Table> table);

// Pair of functions using google::protobuf::Message and polymorphism to
work with any message type
Result<std::shared_prt<Table>> ProtosToTable( const
std::vector<google::protobuf::Message *>& protos);

// I don't like that this returns a vector of unique pointers. Is there a
better way to return a vector of base classes while retaining polymorphic
behavior?
Result<std::vector<std::unique_ptr<google::protobuf::Message>>>
TableToProtos (const std::shared_prt<Table> table, const
google::protobuf:Descriptor* descriptor);
"""

My particular use case for these functions is that I would like to use
protobufs for the in-memory data representation as it provides strongly
typed classes which are very expressive (can have nested/repeated fields)
and a well established path for schema evolution. However, I would like to
use parquet as the data storage layer (and arrow as the glue between the
two) so that I can take advantage of technologies like presto for querying
the data. I'm hoping that backwards compatible changes to the proto schema
turn into backwards compatible changes in the parquet files. I'm also a bit
curious to see if arrow allows faster deserialization when compared to a
list of serialized protos on disk.

Reply via email to