[GitHub] [arrow-datafusion] Cheappie commented on issue #4533: FileStream requires fake ObjectStore when ParquetFileReaderFactory is used

GitBox Wed, 14 Dec 2022 05:17:57 -0800


Cheappie commented on issue #4533:
URL: 
https://github.com/apache/arrow-datafusion/issues/4533#issuecomment-1351339352


   >>> What do you think about moving schema inference into scan and removing 
it from TableProvider trait ?
   
   >> I don't think this is possible, as planning needs to know the schema. In 
general though performing schema inference per query is very expensive, 
especially for non-parquet data. I strongly recommend investing in some sort of 
catalog to store this data.
   
   > I thought that maybe schema could be kept(cached) in TableProvider 
implementations but It would be exposed only through scan operation. For 
example FileScanConfig holds a reference to the schema. Sure It could be solved 
differently, but in my case schema inference is not as bad as It would seem to 
be.
   
   bump, what do you think ? In such case, schema inference would be a detail 
of scan operation, It could be inferred ahead of time or on every query. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Cheappie commented on issue #4533: FileStream requires fake ObjectStore when ParquetFileReaderFactory is used

Reply via email to