metesynnada commented on PR #8021:
URL: 
https://github.com/apache/arrow-datafusion/pull/8021#issuecomment-1791978871

   > > Instead of creating a new TableProvider for FIFO, we could use a 
StreamingTableProvider including its factory
   > 
   > So this is what I started on implementing, but I wasn't really sure how 
this interface would materially differ from the existing interfaces used by 
this PR. In particular I'm struggling to devise an interface that would 
substantially reduce the amount of boilerplate, whilst still preserving the 
necessary flexibility. Perhaps I am missing something?
   
   To help with this subject, let's break it down into smaller parts. We can 
begin by focusing on the FIFO table and finding ways to optimize the SQL 
involved. Additionally, for external tables that are unbounded, we can create 
FIFO tables with the proper formatting, starting with CSV and AVRO. When it 
comes to session state, we keep `TableProviderFactory` as
   
   ```rust
       /// This is used to create [`TableProvider`] instances for the
       /// `CREATE EXTERNAL TABLE ... STORED AS <FORMAT>` for custom file
       /// formats other than those built into DataFusion
       table_factories: HashMap<String, Arc<dyn TableProviderFactory>>,
   ``` 
   which is not scalable. Currently, Datafusion SQL is closely intertwined with 
the ListingTable approach, which resulted in FIFO support within the 
ListingTable. To avoid this, we need to be able to use multiple 
`TableProviderFactory` instances for a single format, by specifying the 
connector.
   
   TL:DR, this part should change to scale the support.
   
   ```rust
   async fn create_custom_table(
           &self,
           cmd: &CreateExternalTable,
       ) -> Result<Arc<dyn TableProvider>> {
           let state = self.state.read().clone();
           let file_type = cmd.file_type.to_uppercase();
           let factory =
               &state
                   .table_factories
                   .get(file_type.as_str())
                   .ok_or_else(|| {
                       DataFusionError::Execution(format!(
                           "Unable to find factory for {}",
                           cmd.file_type
                       ))
                   })?;
           let table = (*factory).create(&state, cmd).await?;
           Ok(table)
       }
   ``` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to