tustvold commented on issue #4625: URL: https://github.com/apache/arrow-datafusion/issues/4625#issuecomment-1351818734
> Maybe I need to find a way to have ListingTable pre-populated with the stats before the optimizer is invoked thinking I see no issue with pre-fetching statistics for `ListingTable` if configured to do so. This will be expensive, but there isn't really a way around this. If people have large numbers of parquet files, they should probably invest in a proper catalog and not be using `ListingTable`. This would, however, entail also baking the list of files in at creation time, which I think would be a behavior change, although I personally think is an improvement. One thing that wasn't clear to me when I tried to do something similar the other day is how to combine Statistics together, I think this would also be a pre-requisite for this, as Statistics are currently collected per file and would need to be aggregated to per-table. > to ListingTable is that I need a SessionContext and of course, optimizer rules do not have access to that. Given the nature of ListingTable I think moving it away from needing SessionState is going to be fairly hard -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
