alamb commented on issue #8345:
URL: 
https://github.com/apache/arrow-datafusion/issues/8345#issuecomment-1837472791

   BTW I really love the idea of a trait based implementation of file formats 
as it would permit (eventually) pulling the format support (and its 
dependecies) into separate crates and would make the codebase more manageable. 
   
   > I took a brief look at where we are matching on the FileType enum, and I 
don't see any reason why those sections could not be refactored as method calls 
on a trait object. It would probably make the most sense to add that 
functionality to FileFormat and consolidate all file related abstractions under 
one trait.
   
   I agree this sounds like a good idea. If we want to head down this route, I 
would suggest making a Proof of Concept (POC) PR that adds the trait and shows 
how it might work for one of the formats (you could temporarily add a new 
`FileType::Dynamic(Arc<dyn FileTypeTrait>)` variant) 
   
   The POC would help figure out what areas might need additional work / that 
we haven't figured out yet.
   
   
   > As for the distinction between the enumeration and trait, serialization 
might be part of the reason behind this design. It is certainly part of the 
rationale behind using ListingTableUrl to provide a serializable version of 
ObjectStore
   
   @tustvold  are you referring to 
   1. plan serialization (e.g. 
[datafusion-proto](https://docs.rs/datafusion-proto/latest/datafusion_proto/))
   2. data serialization (e.g. RecordBatches --> Parquet bytes)?
   3. Something else ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to