Hi,

Is there support for accessing Substrait protobuf Python classes (such as Plan) 
from PyArrow? If not, how should such support be added? For example, should the 
PyArrow build system pull in the Substrait repo as an external project and 
build its protobuf Python classes, in a manner similar to how Arrow C++ does it?

I'm pondering these questions after running into an issue with code I'm writing 
under PyArrow that parses a Substrait plan represented as a dictionary. The 
current (and kind of shaky) parsing operation in this code uses json.dumps() on 
the dictionary, which results in a string that is passed to a Cython API that 
handles it using Arrow C++ code that has access to Substrait protobuf C++ 
classes. But when the Substrait plan contains a bytes-type, json.dump() no 
longer works and fails with "TypeError: Object of type bytes is not JSON 
serializable". A fix for this, and a better way to parse, is using 
google.protobuf.json_format.ParseDict() [1] on the dictionary. However, this 
invocation requires a second argument, namely a protobuf message instance to 
merge with. The class of this message (such as Plan) is a Substrait protobuf 
Python class, hence the need to access such classes from PyArrow.

[1] 
https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html


Yaron.

Reply via email to