jhungund commented on PR #5832: URL: https://github.com/apache/hbase/pull/5832#issuecomment-2063509932
> > The time-based priority eviction policy relies on the presence of path in > > the BlockCacheKey to fetch the required metadata to check data hotness and > > decide whether or not to retain the block in the bucket cache. > > Can you explain why do you need that? This is not a simple/small change. Our design originally relied on the presence of path in the BlockCacheKey so that we parse the path to get to the regionID and column family to reach to the file and access its metadata. (The framework that @vinayakphegde has implemented). The purpose of using the path is slightly avoid the overhead of traversing through the regions and column families and their files, if we rely on the filenames to fetch the corresponding metadata. However, what we found is that path may or may not be always populated by the callers who instantiate BlockCacheKey. Hence, this change enforces the users and also unit tests to always instantiate/create BlockCacheKey using the paths. Hence, this change turned out to be a big change. An alternative to this is another approach that relies only on the file name which is always present in the BlockCacheKey. With this approach, we do not make any changes in the callers of BlockcacheKey or the unit tests. During cacheEvictions (freeSpace), we will require one traversal through all the files. I had tried to implement this approach in another change: https://github.com/apache/hbase/pull/5829 Please take a look and let me know your idea about the same. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
