metesynnada commented on PR #8021:
URL:
https://github.com/apache/arrow-datafusion/pull/8021#issuecomment-1791978871
> > Instead of creating a new TableProvider for FIFO, we could use a
StreamingTableProvider including its factory
>
> So this is what I started on implementing, but I wasn't really sure how
this interface would materially differ from the existing interfaces used by
this PR. In particular I'm struggling to devise an interface that would
substantially reduce the amount of boilerplate, whilst still preserving the
necessary flexibility. Perhaps I am missing something?
To help with this subject, let's break it down into smaller parts. We can
begin by focusing on the FIFO table and finding ways to optimize the SQL
involved. Additionally, for external tables that are unbounded, we can create
FIFO tables with the proper formatting, starting with CSV and AVRO. When it
comes to session state, we keep `TableProviderFactory` as
```rust
/// This is used to create [`TableProvider`] instances for the
/// `CREATE EXTERNAL TABLE ... STORED AS <FORMAT>` for custom file
/// formats other than those built into DataFusion
table_factories: HashMap<String, Arc<dyn TableProviderFactory>>,
```
which is not scalable. Currently, Datafusion SQL is closely intertwined with
the ListingTable approach, which resulted in FIFO support within the
ListingTable. To avoid this, we need to be able to use multiple
`TableProviderFactory` instances for a single format, by specifying the
connector.
TL:DR, this part should change to scale the support.
```rust
async fn create_custom_table(
&self,
cmd: &CreateExternalTable,
) -> Result<Arc<dyn TableProvider>> {
let state = self.state.read().clone();
let file_type = cmd.file_type.to_uppercase();
let factory =
&state
.table_factories
.get(file_type.as_str())
.ok_or_else(|| {
DataFusionError::Execution(format!(
"Unable to find factory for {}",
cmd.file_type
))
})?;
let table = (*factory).create(&state, cmd).await?;
Ok(table)
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]