keith-turner opened a new issue, #5559: URL: https://github.com/apache/accumulo/issues/5559
**Is your feature request related to a problem? Please describe.** Some ranges or locality groups within a table or tablet may never benefit from being cached and caching them will unnecessarily pollute the cache. Currently caching is only tune-able at a table and scan level, but tuning based on data within a table is not possible. **Describe the solution you'd like** A pluggable mechanism to tune what specific data in a table should and should not be cached. Prior to loading a block into cache the information in [IndexEntry](https://github.com/apache/accumulo/blob/77078f321fb492c0d80499973df04b386282e06f/core/src/main/java/org/apache/accumulo/core/file/rfile/MultiLevelIndex.java#L50) is known. The key field in IndexEntry is the last key in the datablock, it is easy in the RFIle code to obtain the previous index entry which has the last key for the previous block. Can use these two keys to construct a range that the data block covers. Also the RFile code knows the locality group it is operating on. With this information an SPI interface like the following could be constructed. With this interface a user could somehow provide a `Predicate<CacheRequest>` that can make fine grained decisions on what data to actually cache. ```java /** * Provides the characteristics of block of data that is under consideration for caching. */ interface CacheRequest { /** * @return the number of entries in the cached block */ int getNumEntries(); /** * @return the uncompressed size of the cached block */ long getUncompressedSize(); /** * @return the compressed size of the cached block */ long getCompressedSize(); /** * @return the name of the locality group of the cache block */ String getLocalityGroup(); /** * @return the data range of the cache block */ Range getDataRange(); } ``` Not sure where this should be added to the SPI. It could be added to BlockCacheManager, but that will make configuriing this per table really cumbersome as that is configured at the server level. It could be added to the existing ScanDispatch SPI which already allows making some per scan caching decisions in the SPI. Scan dispatching can be configured per table already. Or maybe it could be added somewhere else. **Describe alternatives you've considered** A clear and concise description of any alternative solutions or features you've considered. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org