mateuszkj opened a new pull request, #3649:
URL: https://github.com/apache/arrow-datafusion/pull/3649
# Which issue does this PR close?
Closes #871 (but I'm not sure about that).
# Rationale for this change
Reduce I/O by collecting statistics for files (parquet) only once in
`ListingTable`.
# What changes are included in this PR?
Store collected statistics in cache per file location.
Cache is invalided when:
- File size has changed
- File last modification has changed
# Are there any user-facing changes?
No.
Or maybe mention that sometimes when `collect_stats` is enabled first query
can be much slower due to increased I/O while collecting statistics. Cached
statistics are invalidated in next query when table file has changed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]