I'm not great at this build stuff but I think the basic idea is that
you will need to package your custom nodes into a shared object.
You'll need to then somehow trigger that shared object to load from
python.  This seems like a good place to invoke the initialize method.

Currently pyarrow has to do this because the datasets module
(libarrow_dataset.so) adds some custom nodes (scan node, dataset write
node).  The datasets module defines the Initialize method.  This
method is called in _exec_plan.pyx when the python module is loaded.
I don't know cython well enough to know how exactly it triggers the
datasets shared object to load.

On Tue, Sep 20, 2022 at 11:01 AM Li Jin <ice.xell...@gmail.com> wrote:
>
> Hi,
>
> Recently I am working on adding a custom data source node to Acero and was
> pointed to a few examples in the dataset code.
>
> If I understand this correctly, the registering of dataset exec node is
> currently happening when this is loaded:
> https://github.com/apache/arrow/blob/master/python/pyarrow/_exec_plan.pyx#L36
>
> I wonder if I have a custom "Initialize'' method that registers additional
> ExecNode, where is the right place to invoke such initialization?
> Eventually I want to execute my query via ibis-substrait and Acero
> substrait consumer Python API.
>
> Thanks,
> Li

Reply via email to