alamb commented on code in PR #17031:
URL: https://github.com/apache/datafusion/pull/17031#discussion_r2254953311
##########
Cargo.toml:
##########
@@ -153,6 +153,7 @@ hex = { version = "0.4.3" }
indexmap = "2.10.0"
itertools = "0.14"
log = "^0.4"
+lru = "0.16.0"
Review Comment:
I spoke with @XiangpengHao this afternoon and we have a proposal:
1. Use a FIFO queue -- aka evict the oldest elements first (e.g. store
`Path`s in a `VecDeque` or something and remove them from the HashSet in order
when the cache is full)
Benefits of this approach
1. Simple to implement
2. Predictable, easy to explain behavior
Downsides:
1. Can evict item that are frequently used
2. Doesn't account for how costly an item is to keep in the cache
However, I think this is an appropriate tradeoff for the default
implementation because
1. a properly sized cache will have no evictions (so the eviction strategy
for the simple case doesn't matter)
1. users can supply their own implementations for more advanced usecases
I argue against including anything more complicated in the initial
implementation because of the many tradeoffs. A cache eviction strategy needs
to weigh multiple different factors, for example
1. The cost (eg. size in bytes) of keeping elements in the cache
3. The benefit (e.g. how much will the cached metadata improve future
queries)
4. The cost of next miss (e.g. the cost to reload metadata from a local disk
is likely much lower than from S3)
The relative tradeoffs between these factors likely varies substantially
from system to sytem
Users with more advanced needs can implement whatever strategy is most
appropriate to their system via the traits
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]