[GitHub] [hbase] Apache9 commented on a change in pull request #3751: HBASE-26271: Cleanup the broken store files under data directory

GitBox Wed, 20 Oct 2021 06:45:02 -0700


Apache9 commented on a change in pull request #3751:
URL: https://github.com/apache/hbase/pull/3751#discussion_r732785723




##########
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/AbstractMultiFileWriter.java
##########
@@ -110,7 +110,7 @@ public void init(StoreScanner sourceScanner, WriterFactory 
factory) {
     return paths;
   }
 
-  protected abstract Collection<StoreFileWriter> writers();
+  public abstract Collection<StoreFileWriter> writers();

Review comment:
       Why we need these changes in compactor?

##########
File path: 
hbase-rest/src/main/java/org/apache/hadoop/hbase/rest/RootResource.java
##########
@@ -92,6 +92,12 @@ public StorageClusterStatusResource 
getClusterStatusResource()
     return new StorageClusterStatusResource();
   }
 
+  @Path("status/fileBasedStoreFileCleaner")

Review comment:
       Also for the rest part, let's do these things in separated issues.

##########
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FileBasedStoreFileCleaner.java
##########
@@ -0,0 +1,200 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase.regionserver;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hbase.ScheduledChore;
+import org.apache.hadoop.hbase.Stoppable;
+import org.apache.hadoop.hbase.io.HFileLink;
+import org.apache.hadoop.hbase.util.EnvironmentEdgeManager;
+import org.apache.hadoop.ipc.RemoteException;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicLong;
+
+/**
+ * This Chore, every time it runs, will clear the unsused HFiles in the data
+ * folder.
+ */
[email protected] public class FileBasedStoreFileCleaner extends 
ScheduledChore {

Review comment:
       I think we could move the schedule logic into the SFT implementation? 
The implementation could choose whether to schedule a cleaner. And the only 
condition I could see now, is whether we will write to the data directory 
directly, if so, we need a cleaner, otherwise we do not. And the logic of the 
cleaner should be the same? Just list the data directory, compare the returned 
list with the one gotten from the SFT, if here are diffs then delete the unused 
files(of course we need to deal with some corner cases, maybe by timestamp, as 
discussed on jira).

##########
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterMetrics.java
##########
@@ -161,6 +162,12 @@ default double getAverageLoad() {
    */
   Map<TableName, RegionStatesCount> getTableRegionStatesCount();
 
+  /**
+   * Provide information about FileBasedFileStoreCleaner chore
+   * @return
+   */
+  Map<String, FileBasedStoreFileCleanerStatus> 
getFileBasedStoreFileCleanerStatus();

Review comment:
       I think metrics could be a separated issue, which could make this PR 
smaller and let's focus on the core part first.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hbase] Apache9 commented on a change in pull request #3751: HBASE-26271: Cleanup the broken store files under data directory

Reply via email to