alamb opened a new issue, #7926:
URL: https://github.com/apache/arrow-datafusion/issues/7926
### Is your feature request related to a problem or challenge?
It is sometimes helpful to have a custom table functions to extend
DataFusion's functionality. As we continue to get more feature requests, such
as #7859 it is important to support such usecases without having to add
everything to the DataFusion core.
For example in the following query `my_custom_fun` is a table function
```sql
SELECT foo, bar FROM my_custom_fun(city='NYC', year=2020)
```
A specific example might be a function that fetches the contents of remote
csv file and parses it into a table.
```sql
SELECT date, value FROM
parse_remote_csv('https://data.wa.gov/api/views/f6w7-q2d2/rows.csv?accessType=DOWNLOAD')
```
You can do something similar to this with a
[TableProvider](https://docs.rs/datafusion/latest/datafusion/datasource/provider/trait.TableProvider.html),
but the main differences are:
1. A `TableProvider` has no way to pass parameters
2. A `TableProvider's schema is fixed (it can't be a function of the
parameters)
## Prior Art
Other examples include the [`read_parquet` etc functions in
DuckDB](https://duckdb.org/docs/data/parquet/overview.html)
```sql
SELECT * FROM read_parquet(['folder1/*.parquet', 'folder2/*.parquet']);
...
SELECT * FROM parquet_schema('test.parquet');
```
### Describe the solution you'd like
I would like to be able to have a table function that supported everything a
TableProvider does, including filter and projection pushdown. One way to do so
would be to return a TableProvder:
# Option 1: Add to `FunctionRegistry`:
We could add Table Functions to the
[datafusion](https://docs.rs/datafusion/latest/datafusion/index.html)::[execution](https://docs.rs/datafusion/latest/datafusion/execution/index.html)::[FunctionRegistry](https://docs.rs/datafusion/latest/datafusion/execution/trait.FunctionRegistry.html#)
along with the UDFs, UDAs, etc which arguably would make them easier to
discover
Something like
```rust
trait FunctionRegistry {
...
/// Return a `TableProvider` for executing the `name` table function
fn udtf(name: &str, args: &[Expr]) -> Result<Arc<dyn TableProvider>>;
}
```
We would probably also need a
```rust
trait TableUDF {
/// Return a `TableProvider` for executing this table function, given
the specified
/// arguments
fn invoke(name: &str, args: &[Expr]) -> Result<Arc<dyn TableProvider>>;
}
```
### Describe alternatives you've considered
This API is very powerful and would allow Table Functions to do anything a
table provider does. We could also offer a stripped down version of the API
potentially
We can probably add something like
[datafusion](https://docs.rs/datafusion/latest/datafusion/index.html)::[logical_expr](https://docs.rs/datafusion/latest/datafusion/logical_expr/index.html)::[create_udf](https://docs.rs/datafusion/latest/datafusion/logical_expr/fn.create_udf.html#)
to make it easier to construct basic table functions (e.g that produce a
single `SendableRecordBatchStream`)
# : Add to `SchemaProvider`:
We could also adding Table Functions to
[datafusion](https://docs.rs/datafusion/latest/datafusion/index.html)::[catalog](https://docs.rs/datafusion/latest/datafusion/catalog/index.html)::[schema](https://docs.rs/datafusion/latest/datafusion/catalog/schema/index.html)::[SchemaProvider](https://docs.rs/datafusion/latest/datafusion/catalog/schema/trait.SchemaProvider.html#)
This might make sense given how similar `TableFunction`s are to
`TableProvider`s
```rust
trait SchemaProvider {
...
/// Return a `TableProvider` for executing the `name` table function, given
the specified
/// arguments
fn table_function(name: &str) -> Result<Arc<dyn TableProvider>>;
/// Register the `TableFunction` with the specified name, returning the
previously registered function, if any
fn register_table_function(name: &str) -> Result<Option<Arc<dyn
TableProvider>>>;
...
}
```
### Additional context
I thought there was an existing ticket for this, but I can not find one
This came up several times, including:
* https://github.com/apache/arrow-datafusion/issues/6518
* On discord:
https://discord.com/channels/885562378132000778/1166447479609376850/1166702178916892793
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]