alamb commented on issue #8345: URL: https://github.com/apache/arrow-datafusion/issues/8345#issuecomment-1837472791
BTW I really love the idea of a trait based implementation of file formats as it would permit (eventually) pulling the format support (and its dependecies) into separate crates and would make the codebase more manageable. > I took a brief look at where we are matching on the FileType enum, and I don't see any reason why those sections could not be refactored as method calls on a trait object. It would probably make the most sense to add that functionality to FileFormat and consolidate all file related abstractions under one trait. I agree this sounds like a good idea. If we want to head down this route, I would suggest making a Proof of Concept (POC) PR that adds the trait and shows how it might work for one of the formats (you could temporarily add a new `FileType::Dynamic(Arc<dyn FileTypeTrait>)` variant) The POC would help figure out what areas might need additional work / that we haven't figured out yet. > As for the distinction between the enumeration and trait, serialization might be part of the reason behind this design. It is certainly part of the rationale behind using ListingTableUrl to provide a serializable version of ObjectStore @tustvold are you referring to 1. plan serialization (e.g. [datafusion-proto](https://docs.rs/datafusion-proto/latest/datafusion_proto/)) 2. data serialization (e.g. RecordBatches --> Parquet bytes)? 3. Something else ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
