jhungund commented on PR #5832:
URL: https://github.com/apache/hbase/pull/5832#issuecomment-2063509932

   > > The time-based priority eviction policy relies on the presence of path in
   > > the BlockCacheKey to fetch the required metadata to check data hotness 
and
   > > decide whether or not to retain the block in the bucket cache.
   > 
   > Can you explain why do you need that? This is not a simple/small change.
   
   Our design originally relied on the presence of path in the BlockCacheKey so 
that we parse the path to get to the regionID and column family to reach to the 
file and access its metadata. (The framework that @vinayakphegde  has 
implemented).
   The purpose of using the path is slightly avoid the overhead of traversing 
through the regions and column families and their files, if we rely on the 
filenames to fetch the corresponding metadata.
   
   However, what we found is that path may or may not be always populated by 
the callers who instantiate BlockCacheKey.
   Hence, this change enforces the users and also unit tests to always 
instantiate/create BlockCacheKey using the paths.
   Hence, this change turned out to be a big change.
   
   An alternative to this is another approach that relies only on the file name 
which is always present in the BlockCacheKey. With this approach, we do not 
make any changes in the callers of BlockcacheKey or the unit tests.
   During cacheEvictions (freeSpace), we will require one traversal through all 
the files. I had tried to implement this approach in another change: 
https://github.com/apache/hbase/pull/5829
   
   Please take a look and let me know your idea about the same.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to