Thanks Weston - I have not rewritten Python/C++ bridge so this is also new to me and I am hoping to get some information from people that know how to do this.
I will leave this open for other people to offer help :) and will ask some internal folks as well. Will circle back on this. On Tue, Sep 20, 2022 at 8:50 PM Weston Pace <weston.p...@gmail.com> wrote: > I'm not great at this build stuff but I think the basic idea is that > you will need to package your custom nodes into a shared object. > You'll need to then somehow trigger that shared object to load from > python. This seems like a good place to invoke the initialize method. > > Currently pyarrow has to do this because the datasets module > (libarrow_dataset.so) adds some custom nodes (scan node, dataset write > node). The datasets module defines the Initialize method. This > method is called in _exec_plan.pyx when the python module is loaded. > I don't know cython well enough to know how exactly it triggers the > datasets shared object to load. > > On Tue, Sep 20, 2022 at 11:01 AM Li Jin <ice.xell...@gmail.com> wrote: > > > > Hi, > > > > Recently I am working on adding a custom data source node to Acero and > was > > pointed to a few examples in the dataset code. > > > > If I understand this correctly, the registering of dataset exec node is > > currently happening when this is loaded: > > > https://github.com/apache/arrow/blob/master/python/pyarrow/_exec_plan.pyx#L36 > > > > I wonder if I have a custom "Initialize'' method that registers > additional > > ExecNode, where is the right place to invoke such initialization? > > Eventually I want to execute my query via ibis-substrait and Acero > > substrait consumer Python API. > > > > Thanks, > > Li >