It seems to me that your use case could be handled by defining a custom NamedTableProvider and assigning this to ConversionOptions::named_table_provider. This was added in https://github.com/apache/arrow/pull/13613 to provide user configurable dispatching for named tables; if it doesn't address your use case then we might want to create a JIRA to extend it.
On Tue, Sep 27, 2022 at 10:41 AM Li Jin <ice.xell...@gmail.com> wrote: > I did some more digging into this and have some ideas - > > Currently, the logic for deserialization named table is: > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/engine/substrait/relation_internal.cc#L129 > and it will look up named tables from a user provided dictionary from > string -> arrow Table. > > My idea is to make some short term changes to allow named tables to be > dispatched differently (This logic can be reverted/removed once we figure > out the proper way to support custom data sources, perhaps via substrait > Extensions.), specifically: > > (1) The user creates named table with uris for custom data source, i.e., > "my_datasource://tablename?begin=20200101&end=20210101" > (2) In the substrait consumer, allowing user to register custom dispatch > rules based on uri scheme (similar to how exec node registry works), i.e., > sth like: > > substrait_named_table_registry.add("my_datasource", deser_my_datasource) > and deser_my_datasource is a function that takes the NamedTable substrait > message and returns a declaration. > > I know doing this just for named tables might not be a very general > solution but seems the easiest path forward, and we can always remove this > later in favor of a more generic solution. > > Thoughts? > > Li > > > > > > On Mon, Sep 26, 2022 at 10:58 AM Li Jin <ice.xell...@gmail.com> wrote: > > > Hello! > > > > I am working on adding a custom data source node in Acero. I have a few > > previous threads related to this topic. > > > > Currently, I am able to register my custom factory method with Acero and > > create a Custom source node, i.e., I can register and execute this with > > Acero: > > > > MySourceNodeOptions source_options = ... > > Declaration source{"my_source", source_option} > > > > The next step I want to do is to pass this through to the Acero substrait > > consumer. From previous discussions, I am going to use "NamedTable '' as > a > > temporary way to define my custom data source in substrait. My question > is > > this: > > > > What I need to do in substrait in order to register my own substrait > > consumer rule/function for deserializing my custom named table protobuf > > message into the declaration above. If this is not supported right now, > > what is a reasonable/minimal change to make this work? > > > > Thanks, > > Li > > >