Hi, Is there support for accessing Substrait protobuf Python classes (such as Plan) from PyArrow? If not, how should such support be added? For example, should the PyArrow build system pull in the Substrait repo as an external project and build its protobuf Python classes, in a manner similar to how Arrow C++ does it?
I'm pondering these questions after running into an issue with code I'm writing under PyArrow that parses a Substrait plan represented as a dictionary. The current (and kind of shaky) parsing operation in this code uses json.dumps() on the dictionary, which results in a string that is passed to a Cython API that handles it using Arrow C++ code that has access to Substrait protobuf C++ classes. But when the Substrait plan contains a bytes-type, json.dump() no longer works and fails with "TypeError: Object of type bytes is not JSON serializable". A fix for this, and a better way to parse, is using google.protobuf.json_format.ParseDict() [1] on the dictionary. However, this invocation requires a second argument, namely a protobuf message instance to merge with. The class of this message (such as Plan) is a Substrait protobuf Python class, hence the need to access such classes from PyArrow. [1] https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html Yaron.