devinjdangelo commented on issue #8345:
URL: 
https://github.com/apache/arrow-datafusion/issues/8345#issuecomment-1837146453

   > I guess my first question is "are FileType and FileFormat actually 
meaningfully distinct?" Do we imagine that there are cases where you'd want to 
have different FileType values but the same FileFormat? Seem's unlikely to me 
but I don't know.
   
   I don't see any fundamental reason why there should be multiple file format 
related abstractions. I think it is mostly a product of development on 
different parts of the codebase related to files progressing at the same time. 
I know I was guilty at one point of creating a third file related enum, which 
was consolidated into FileType later. 
   
   I took a brief look at where we are matching on the FileType enum, and I 
don't see any reason why those sections could not be refactored as method calls 
on a trait object. It would probably make the most sense to add that 
functionality to FileFormat and consolidate all file related abstractions under 
one trait. 
   
   > Second very high level question: is there anything to be done in 
ListingTable separetly from either "FileType as a trait" or flattening FileType 
into FileFormat.
   
   I'm not sure, but I think it makes a lot of sense to handle file type 
specific related operations using a single trait object, similar to how we 
currently do with ObjectStore. 
   
   > Secondly, (and this is somewhat orthogonal) but for the project that 
brought me to this point; I'm leaning towards just to implementing 
StreamingTable (rather than FileFormat or a TableProvier from the ground up, 
something approximating it?) For things like JSON and CSV this seems maybe more 
ideal? Though of course, I'm new and don't know all of the innards. Obviously 
this is very format specific.
   
   I am not very familiar with the new StreamingTable, but it appears to be a 
struct rather than a trait. StreamingTable itself implements TableProvider, so 
I'm not quite sure how it helps here. @tustvold may know better than I on this 
though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to