Re: [PR] HBASE-28564 Refactor the creation of HStoreFile through SFT interface and move creation of StoreFileInfo to SFT interface [hbase]

via GitHub Fri, 13 Sep 2024 12:03:37 -0700


apurtell commented on code in PR #5939:
URL: https://github.com/apache/hbase/pull/5939#discussion_r1759292740



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/SplitTableRegionProcedure.java:
##########
@@ -740,8 +742,9 @@ private Pair<List<Path>, List<Path>> splitStoreFiles(final 
MasterProcedureEnv en
           // is running in a regionserver's Store context, or we might not be 
able
           // to read the hfiles.
           storeFileInfo.setConf(storeConfiguration);
-          StoreFileSplitter sfs = new StoreFileSplitter(regionFs, familyName,
-            new HStoreFile(storeFileInfo, hcd.getBloomFilterType(), 
CacheConfig.DISABLED));
+          StoreFileSplitter sfs =

Review Comment:
   Nit: Assign the value of .getSecond() to a local variable of type 
`StoreFileTracker` and use that variable instead so it is easier to follow what 
is going on here.



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/SplitTableRegionProcedure.java:
##########
@@ -726,10 +727,11 @@ private Pair<List<Path>, List<Path>> 
splitStoreFiles(final MasterProcedureEnv en
     final List<Future<Pair<Path, Path>>> futures = new 
ArrayList<Future<Pair<Path, Path>>>(nbFiles);
 
     // Split each store file.
-    for (Map.Entry<String, Collection<StoreFileInfo>> e : files.entrySet()) {
+    for (Map.Entry<String, Pair<Collection<StoreFileInfo>, StoreFileTracker>> 
e : files
+      .entrySet()) {
       byte[] familyName = Bytes.toBytes(e.getKey());
       final ColumnFamilyDescriptor hcd = htd.getColumnFamily(familyName);
-      final Collection<StoreFileInfo> storeFiles = e.getValue();
+      final Collection<StoreFileInfo> storeFiles = e.getValue().getFirst();

Review Comment:
   Nit: Assign the value of .getFirst() to a local variable of type 
`Collection<StoreFileInfo>` and use that variable instead so it is easier to 
follow what is going on here.



##########
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionFileSystem.java:
##########
@@ -352,33 +351,36 @@ public void setWorkingDirectory(Path arg0) {
   }
 
   @Test
+  // TODO: move this to test layer of SFT interface

Review Comment:
   What is the plan for this? To do for this issue, or a follow up JIRA? Is it 
filed yet or will it be? etc.



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionFileSystem.java:
##########
@@ -351,18 +306,19 @@ Path getStoreFilePath(final String familyName, final 
String fileName) {
    * @param fileName   File Name
    * @return The {@link StoreFileInfo} for the specified family/file
    */
-  StoreFileInfo getStoreFileInfo(final String familyName, final String 
fileName)
-    throws IOException {
+  StoreFileInfo getStoreFileInfo(final String familyName, final String 
fileName,
+    final StoreFileTracker tracker) throws IOException {
     Path familyDir = getStoreDir(familyName);
     return ServerRegionReplicaUtil.getStoreFileInfo(conf, fs, regionInfo, 
regionInfoForFs,
-      familyName, new Path(familyDir, fileName));
+      familyName, new Path(familyDir, fileName), tracker);
   }
 
   /**
    * Returns true if the specified family has reference files
    * @param familyName Column Family Name
    * @return true if family contains reference files
    */
+  // TODO: move this to SFT

Review Comment:
   What is the plan here, are you thinking?



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/storefiletracker/FileBasedStoreFileTracker.java:
##########
@@ -77,7 +77,7 @@ protected List<StoreFileInfo> doLoadStoreFiles(boolean 
readOnly) throws IOExcept
     for (StoreFileEntry entry : list.getStoreFileList()) {
       infos.add(ServerRegionReplicaUtil.getStoreFileInfo(conf, fs, 
ctx.getRegionInfo(),
         ctx.getRegionFileSystem().getRegionInfoForFS(), 
ctx.getFamily().getNameAsString(),
-        new Path(ctx.getFamilyStoreDirectoryPath(), entry.getName())));
+        new Path(ctx.getFamilyStoreDirectoryPath(), entry.getName()), this));

Review Comment:
   A question about `FileBasedStoreFileTracker` in general. 
   
   So the plan is to refactor the reference file handling into 
`StoreFileTrackerBase`, and simply inherit it here for now and otherwise do not 
change it; and then on some later JIRA maybe evolve the representation and 
handling references with overrides in `FileBasedStoreFileTracker` somehow. 
Correct? That's why we are moving more into SFT? :-) 
   Consider writing up a couple of sentences about the motivation of this work 
and the long term directions and putting that somewhere near the top of the PR 
description?



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/mob/MobFile.java:
##########
@@ -134,11 +135,13 @@ public void close() throws IOException {
    * @param cacheConf The CacheConfig.
    * @return An instance of the MobFile.
    */
-  public static MobFile create(FileSystem fs, Path path, Configuration conf, 
CacheConfig cacheConf)
-    throws IOException {
-    // XXX: primaryReplica is only used for constructing the key of block 
cache so it is not a
-    // critical problem if we pass the wrong value, so here we always pass 
true. Need to fix later.
-    HStoreFile sf = new HStoreFile(fs, path, conf, cacheConf, BloomType.NONE, 
true);
+  public static MobFile create(FileSystem fs, Path path, Configuration conf, 
CacheConfig cacheConf,
+    StoreFileTracker sft) throws IOException {
+    // XXX: primaryReplica is only used for constructing the key of block 
cache so

Review Comment:
   Fix formatting



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/storefiletracker/DefaultStoreFileTracker.java:
##########
@@ -37,6 +45,8 @@ public DefaultStoreFileTracker(Configuration conf, boolean 
isPrimaryReplica, Sto
     super(conf, isPrimaryReplica, ctx);
   }
 
+  private static final Logger LOG = 
LoggerFactory.getLogger(DefaultStoreFileTracker.class);

Review Comment:
   Now that we have a Logger, and already there is at least one trace level 
log, what do you think about adding trace level logging for all the SFT 
activities? That might help with debugging issues in the future?
   This comment applies for all SFT impls.



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:
##########
@@ -1347,7 +1349,9 @@ public static HDFSBlocksDistribution 
computeHDFSBlocksDistribution(Configuration
         if (StoreFileInfo.isReference(p) || HFileLink.isHFileLink(p)) {
           // Only construct StoreFileInfo object if its not a hfile, save obj
           // creation
-          StoreFileInfo storeFileInfo = new StoreFileInfo(conf, fs, status);
+          StoreFileTracker sft =
+            StoreFileTrackerFactory.create(conf, tableDescriptor, family, 
regionFs);
+          StoreFileInfo storeFileInfo = sft.getStoreFileInfo(status, 
status.getPath(), false);

Review Comment:
   Every time we create a StoreFileTracker object it will have no state, and so 
it will either need to go to the filesystem and list the directory or read the 
tracker file depending on which type it is in order to initialize as soon as we 
try to use it.
   
   It's fine... because the original code causes IO to happen also.. however,
   
   What do you think about the possibility of reuse? This is a more general 
question than a comment about this particular call site. Should the 
StoreFileTrackerFactory cache instances and return the cached instances that 
match the arguments to StoreFileTrackerFactory.create() rather than make a new 
instance? Can StoreFileTracker instances be made thread safe so they can be 
cached and shared? 
   
   If we have reuse, and all the relevant filesystem ops go through the 
StoreFileTracker, then we could potentially save a lot of filesystem or object 
store IO, because a reused StoreFileTracker would have the ground truth already 
and would not need to go to the filesystem or object store and do IO in order 
to e.g. return the StoreFileInfo of a given path.



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/mob/CachedMobFile.java:
##########
@@ -41,10 +42,12 @@ public CachedMobFile(HStoreFile sf) {
   }
 
   public static CachedMobFile create(FileSystem fs, Path path, Configuration 
conf,
-    CacheConfig cacheConf) throws IOException {
-    // XXX: primaryReplica is only used for constructing the key of block 
cache so it is not a
-    // critical problem if we pass the wrong value, so here we always pass 
true. Need to fix later.
-    HStoreFile sf = new HStoreFile(fs, path, conf, cacheConf, BloomType.NONE, 
true);
+    CacheConfig cacheConf, StoreFileTracker sft) throws IOException {
+    // XXX: primaryReplica is only used for constructing the key of block 
cache so

Review Comment:
   Fix formatting



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotManifest.java:
##########
@@ -341,13 +342,18 @@ protected void addRegion(Path tableDir, RegionInfo 
regionInfo, RegionVisitor vis
     }
   }
 
-  private List<StoreFileInfo> getStoreFiles(Path storeDir) throws IOException {
-    FileStatus[] stats = CommonFSUtils.listStatus(rootFs, storeDir);
+  private List<StoreFileInfo> getStoreFiles(Path storePath, TableDescriptor 
htd,

Review Comment:
   Yeah I remember from my earlier work in this area that we need to plumb 
`TableDescriptor` through all of these places now. :-/ 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] HBASE-28564 Refactor the creation of HStoreFile through SFT interface and move creation of StoreFileInfo to SFT interface [hbase]

Reply via email to