keith-turner opened a new issue, #5559:
URL: https://github.com/apache/accumulo/issues/5559

   **Is your feature request related to a problem? Please describe.**
   
   Some ranges or locality groups within a table or tablet may never benefit 
from being cached and caching them will unnecessarily pollute the cache.  
Currently caching is only tune-able at a table and scan level, but tuning based 
on data within a table is not possible.
   
   **Describe the solution you'd like**
   
   A pluggable mechanism to tune what specific data in a table should and 
should not be cached.  
   
   Prior to loading a block into cache the information in 
[IndexEntry](https://github.com/apache/accumulo/blob/77078f321fb492c0d80499973df04b386282e06f/core/src/main/java/org/apache/accumulo/core/file/rfile/MultiLevelIndex.java#L50)
 is known.  The key field in IndexEntry is the last key in the datablock, it is 
easy in the RFIle code to obtain the previous index entry which has the last 
key for the previous block.  Can use these two keys to construct a range that 
the data block covers.  Also the RFile code knows the locality group it is 
operating on.  
   
   With this information an SPI interface like the following could be 
constructed.  With this interface a user could somehow provide a 
`Predicate<CacheRequest>` that can make fine grained decisions on what data to 
actually cache.   
   
   ```java
     /**
      * Provides the characteristics of block of data that is under 
consideration for caching.
      */
     interface CacheRequest {
       /**
        * @return the number of entries in the cached block
        */
       int getNumEntries();
       /**
        * @return the uncompressed size of the cached block
        */
       long getUncompressedSize();
       /**
        * @return the compressed size of the cached block
        */
       long getCompressedSize();
       /**
        * @return the name of the locality group of the cache block
        */
       String getLocalityGroup();
       /**
        * @return the data range of the cache block
        */
       Range getDataRange();
     }
   ```
   
   Not sure where this should be added to the SPI.  It could be added to 
BlockCacheManager, but that will make configuriing this per table really 
cumbersome as that is configured at the server level.  It could be added to the 
existing ScanDispatch SPI which already allows making some per scan caching 
decisions in the SPI.  Scan dispatching can be configured per table already.  
Or maybe it could be added somewhere else.
   
   **Describe alternatives you've considered**
   A clear and concise description of any alternative solutions or features 
you've considered.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to