alamb opened a new issue, #8657:
URL: https://github.com/apache/arrow-datafusion/issues/8657

   ### Is your feature request related to a problem or challenge?
   
   DataFusion as a neat 
[`ListingTable`](https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.ListingTable.html)
 abstraction that offers the ability to read (and now write) multiple files in 
a directory (among other features)
   
   DataFusion comes with built in support for Avro, Parquet, Arrow, CSV, and 
JSON files. 
   
   However, with the introduction of the ability to write to such files, we 
have inadvertently made it impossible for users to add support for their own 
formats which has been identified in several reports
   
   * https://github.com/apache/arrow-datafusion/issues/8637
   * https://github.com/apache/arrow-datafusion/issues/8345
   
   I think we lost this ability, as pointed out on 
https://github.com/apache/arrow-datafusion/issues/8637 due to the fact that 
[`FileFormat::file_type`](https://docs.rs/datafusion/latest/datafusion/datasource/file_format/trait.FileFormat.html#tymethod.file_type)
 trait now takes a 
[`FileType`](https://docs.rs/datafusion/latest/datafusion/common/enum.FileType.html)
 which is an enum and hence can not be extended. 
   
   
   I also have a longer term goal of extracting listing table out of the core 
of DataFusion (as it is just a (very specialized) `TableProvider`)
   
   ### Describe the solution you'd like
   
   I suggest we should use traits to extend FileType as we have done in other 
areas of the code.
   
   When this is done, we should also make an end to end test case / example 
showing how a user can create support their owne custom file formats in 
`ListingTable` so that we don't cause a regression in functionality like this 
again in the future. 
   
   ### Describe alternatives you've considered
   
   One potential design is to make 
[`FileType`](https://docs.rs/datafusion/latest/datafusion/common/enum.FileType.html)
 a `trait` rather than an `enum`.
   
   I looked briefly into this, and it will likely require:
   1.  converting other structures like `FileTypeWriterOptions` into traits (or 
incorporating them into the `FileType` trait).
   2. Sorting out how to handle serialization as pointed out by @tustvold on 
https://github.com/apache/arrow-datafusion/issues/8345#issuecomment-1837478181
   
   Another slightly different alternate design would be to incorporate all the 
functionality of `FileType` into the existing 
[`FileFormat`](https://docs.rs/datafusion/latest/datafusion/datasource/file_format/trait.FileFormat.html)
 as suggested by @devinjdangelo  on 
https://github.com/apache/arrow-datafusion/issues/8345#issuecomment-1837146453
   
   
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to