devinjdangelo commented on issue #8345: URL: https://github.com/apache/arrow-datafusion/issues/8345#issuecomment-1837146453
> I guess my first question is "are FileType and FileFormat actually meaningfully distinct?" Do we imagine that there are cases where you'd want to have different FileType values but the same FileFormat? Seem's unlikely to me but I don't know. I don't see any fundamental reason why there should be multiple file format related abstractions. I think it is mostly a product of development on different parts of the codebase related to files progressing at the same time. I know I was guilty at one point of creating a third file related enum, which was consolidated into FileType later. I took a brief look at where we are matching on the FileType enum, and I don't see any reason why those sections could not be refactored as method calls on a trait object. It would probably make the most sense to add that functionality to FileFormat and consolidate all file related abstractions under one trait. > Second very high level question: is there anything to be done in ListingTable separetly from either "FileType as a trait" or flattening FileType into FileFormat. I'm not sure, but I think it makes a lot of sense to handle file type specific related operations using a single trait object, similar to how we currently do with ObjectStore. > Secondly, (and this is somewhat orthogonal) but for the project that brought me to this point; I'm leaning towards just to implementing StreamingTable (rather than FileFormat or a TableProvier from the ground up, something approximating it?) For things like JSON and CSV this seems maybe more ideal? Though of course, I'm new and don't know all of the innards. Obviously this is very format specific. I am not very familiar with the new StreamingTable, but it appears to be a struct rather than a trait. StreamingTable itself implements TableProvider, so I'm not quite sure how it helps here. @tustvold may know better than I on this though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
