JerAguilon opened a new issue, #38654:
URL: https://github.com/apache/arrow/issues/38654

   ### Describe the enhancement requested
   
   In the past I have made implementations of custom Acero nodes in arrow C++. 
I vastly prefer wiring the actual DAG in python, and working with pyarrow-style 
tables/record batches on the output.
   
   The API for constructing an Acero `Declaration` looks like:
   
   ```
   class pyarrow.acero.Declaration(factory_name, ExecNodeOptions options, 
inputs=None)
   ```
   
   The issue is that `ExecNodeOptions` is a polymorphic type, and in order to 
use, say, `ScanNodeOptions` or `HashJoinNodeOptions`, one would need to add 
bindings in cython.
   
   This means that as external users, if we create a custom node, there's no 
ergonomic way to construct it from pyarrow without forking arrow and adding 
bindings, which may be untenable if we want to keep with mainline acero.
   
   I propose a more generic Option type that will have cython bindings out the 
gate. Built-in nodes can have their own specialized polymorphic type, but 
custom nodes can use this `Option` to be constructible from pyarrow for free.
   
   In pseudocode, something like this would be really useful:
   
   ```
   class GenericOption(ExecNodeOption):
      add_expression(string key, Expression)
      add_key_value(string key, string value)
      add_sort_key(string key, SortKey)
      ...
   ```
   
   Internally, it would store values in `unordered_map`s.
   
   I'm most interested in pyarrow, but I expect that this sort of binding would 
work for other languages too.
   
   ### Component(s)
   
   C++, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to