tustvold commented on issue #4625:
URL: 
https://github.com/apache/arrow-datafusion/issues/4625#issuecomment-1351818734

   > Maybe I need to find a way to have ListingTable pre-populated with the 
stats before the optimizer is invoked thinking
   
   I see no issue with pre-fetching statistics for `ListingTable` if configured 
to do so. This will be expensive, but there isn't really a way around this. If 
people have large numbers of parquet files, they should probably invest in a 
proper catalog and not be using `ListingTable`.
   
   This would, however, entail also baking the list of files in at creation 
time, which I think would be a behavior change, although I personally think is 
an improvement.
   
   One thing that wasn't clear to me when I tried to do something similar the 
other day is how to combine Statistics together, I think this would also be a 
pre-requisite for this, as Statistics are currently collected per file and 
would need to be aggregated to per-table.
   
   > to ListingTable is that I need a SessionContext and of course, optimizer 
rules do not have access to that.
   
   Given the nature of ListingTable I think moving it away from needing 
SessionState is going to be fairly hard


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to