alamb opened a new issue, #7926:
URL: https://github.com/apache/arrow-datafusion/issues/7926

   ### Is your feature request related to a problem or challenge?
   
   It is sometimes helpful to have a custom table functions to extend 
DataFusion's functionality. As we continue to get more feature requests, such 
as #7859  it is important to support such usecases without having to add 
everything to the DataFusion core. 
   
   For example in the following query `my_custom_fun` is a table function
   
   ```sql
   SELECT foo, bar FROM my_custom_fun(city='NYC', year=2020)
   ```
   
   A specific example might be a function that fetches the contents of remote 
csv file and parses it into a table. 
   ```sql
   SELECT date, value FROM 
parse_remote_csv('https://data.wa.gov/api/views/f6w7-q2d2/rows.csv?accessType=DOWNLOAD')
   ```
   
   You can do something similar to this with a  
[TableProvider](https://docs.rs/datafusion/latest/datafusion/datasource/provider/trait.TableProvider.html),
 but the main differences are:
   1. A `TableProvider` has  no way to pass parameters
   2. A `TableProvider's schema is fixed (it can't be a function of the 
parameters)
   
   
   ## Prior Art
   
   Other examples include the [`read_parquet` etc functions in 
DuckDB](https://duckdb.org/docs/data/parquet/overview.html)
   
   ```sql
   SELECT * FROM read_parquet(['folder1/*.parquet', 'folder2/*.parquet']);
   ...
   SELECT * FROM parquet_schema('test.parquet');
   ```
   
   ### Describe the solution you'd like
   
   I would like to be able to have a table function that supported everything a 
TableProvider does, including filter and projection pushdown. One way to do so 
would be to return a TableProvder:
   
   # Option 1: Add to `FunctionRegistry`:
   
   We could add Table Functions to the 
[datafusion](https://docs.rs/datafusion/latest/datafusion/index.html)::[execution](https://docs.rs/datafusion/latest/datafusion/execution/index.html)::[FunctionRegistry](https://docs.rs/datafusion/latest/datafusion/execution/trait.FunctionRegistry.html#)
 along with the UDFs, UDAs, etc which arguably would make them easier to 
discover
   
   Something like
   
   ```rust
   trait FunctionRegistry  { 
   ...
   /// Return a  `TableProvider` for executing the `name` table function
   fn udtf(name: &str, args: &[Expr]) -> Result<Arc<dyn TableProvider>>;
   
   }
   ```
   
   We would probably also need a 
   
   ```rust
   trait TableUDF {
     /// Return a  `TableProvider` for executing this  table function, given 
the specified
     /// arguments
     fn invoke(name: &str, args: &[Expr]) -> Result<Arc<dyn TableProvider>>;
   }
   ```
   
   
   ### Describe alternatives you've considered
   
   This API is very powerful and would allow Table Functions to do anything a 
table provider does. We could also offer a stripped down version of the API 
potentially
   
   We can probably add something like 
[datafusion](https://docs.rs/datafusion/latest/datafusion/index.html)::[logical_expr](https://docs.rs/datafusion/latest/datafusion/logical_expr/index.html)::[create_udf](https://docs.rs/datafusion/latest/datafusion/logical_expr/fn.create_udf.html#)
 to make it easier to construct basic table functions (e.g that produce a 
single `SendableRecordBatchStream`)
   
   # : Add to `SchemaProvider`:
   We could also adding Table Functions to 
[datafusion](https://docs.rs/datafusion/latest/datafusion/index.html)::[catalog](https://docs.rs/datafusion/latest/datafusion/catalog/index.html)::[schema](https://docs.rs/datafusion/latest/datafusion/catalog/schema/index.html)::[SchemaProvider](https://docs.rs/datafusion/latest/datafusion/catalog/schema/trait.SchemaProvider.html#)
   
   This might make sense given how similar `TableFunction`s are to 
`TableProvider`s
   
   ```rust
   trait SchemaProvider  { 
   ...
   /// Return a  `TableProvider` for executing the `name` table function, given 
the specified
   /// arguments
   fn table_function(name: &str) -> Result<Arc<dyn TableProvider>>;
   
   /// Register the `TableFunction` with the specified name, returning the 
previously registered function, if any
   fn register_table_function(name: &str) -> Result<Option<Arc<dyn 
TableProvider>>>;
   ...
   }
   ```
   
   
   
   ### Additional context
   
   I thought there was an existing ticket for this, but I can not find one
   
   This came up several times, including:
   * https://github.com/apache/arrow-datafusion/issues/6518 
   * On discord: 
https://discord.com/channels/885562378132000778/1166447479609376850/1166702178916892793


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to