Prathyusha created HBASE-28878:
----------------------------------
Summary: Introduce Cache for SFT instances created via
StoreFileTrackerFactory
Key: HBASE-28878
URL: https://issues.apache.org/jira/browse/HBASE-28878
Project: HBase
Issue Type: Improvement
Reporter: Prathyusha
As part of HBASE-28564 the creation of HStoreFile is made SFT aware and anytime
a store file is created, it need SFT instance.
Now with this all the interaction of HStorefiles need SFT instance.
In case of FileBasedStoreFileTracker, each instance of it loads the backed
.filelist file and this can be a costly operation in S3
This Jira targets to introduce a cache layer at StoreFileTrackerFactory for SFT
instances
per each
_TableName + Region + CF + Mode_ (Write/ReadOnly mode of SFT)
More detailed thought process around the same
[here|https://github.com/apache/hbase/pull/5939#discussion_r1759312918]
{code:java}
Every time we create a StoreFileTracker object it will have no state, and so it
will either need to go to the filesystem and list the directory or read the
tracker file depending on which type it is in order to initialize as soon as we
try to use it.
It's fine... because the original code causes IO to happen also.. however,
What do you think about the possibility of reuse? This is a more general
question than a comment about this particular call site. Should the
StoreFileTrackerFactory cache instances and return the cached instances that
match the arguments to StoreFileTrackerFactory.create() rather than make a new
instance? Can StoreFileTracker instances be made thread safe so they can be
cached and shared?
If we have reuse, and all the relevant filesystem ops go through the
StoreFileTracker, then we could potentially save a lot of filesystem or object
store IO, because a reused StoreFileTracker would have the ground truth already
and would not need to go to the filesystem or object store and do IO in order
to e.g. return the StoreFileInfo of a given path.
{code}
----
--
This message was sent by Atlassian Jira
(v8.20.10#820010)