alamb opened a new issue, #8657: URL: https://github.com/apache/arrow-datafusion/issues/8657
### Is your feature request related to a problem or challenge? DataFusion as a neat [`ListingTable`](https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.ListingTable.html) abstraction that offers the ability to read (and now write) multiple files in a directory (among other features) DataFusion comes with built in support for Avro, Parquet, Arrow, CSV, and JSON files. However, with the introduction of the ability to write to such files, we have inadvertently made it impossible for users to add support for their own formats which has been identified in several reports * https://github.com/apache/arrow-datafusion/issues/8637 * https://github.com/apache/arrow-datafusion/issues/8345 I think we lost this ability, as pointed out on https://github.com/apache/arrow-datafusion/issues/8637 due to the fact that [`FileFormat::file_type`](https://docs.rs/datafusion/latest/datafusion/datasource/file_format/trait.FileFormat.html#tymethod.file_type) trait now takes a [`FileType`](https://docs.rs/datafusion/latest/datafusion/common/enum.FileType.html) which is an enum and hence can not be extended. I also have a longer term goal of extracting listing table out of the core of DataFusion (as it is just a (very specialized) `TableProvider`) ### Describe the solution you'd like I suggest we should use traits to extend FileType as we have done in other areas of the code. When this is done, we should also make an end to end test case / example showing how a user can create support their owne custom file formats in `ListingTable` so that we don't cause a regression in functionality like this again in the future. ### Describe alternatives you've considered One potential design is to make [`FileType`](https://docs.rs/datafusion/latest/datafusion/common/enum.FileType.html) a `trait` rather than an `enum`. I looked briefly into this, and it will likely require: 1. converting other structures like `FileTypeWriterOptions` into traits (or incorporating them into the `FileType` trait). 2. Sorting out how to handle serialization as pointed out by @tustvold on https://github.com/apache/arrow-datafusion/issues/8345#issuecomment-1837478181 Another slightly different alternate design would be to incorporate all the functionality of `FileType` into the existing [`FileFormat`](https://docs.rs/datafusion/latest/datafusion/datasource/file_format/trait.FileFormat.html) as suggested by @devinjdangelo on https://github.com/apache/arrow-datafusion/issues/8345#issuecomment-1837146453 ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
