[ 
https://issues.apache.org/jira/browse/HDFS-16967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17707432#comment-17707432
 ] 

ASF GitHub Bot commented on HDFS-16967:
---------------------------------------

goiri commented on code in PR #5523:
URL: https://github.com/apache/hadoop/pull/5523#discussion_r1154947181


##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/driver/impl/StateStoreFileBaseImpl.java:
##########
@@ -168,9 +182,30 @@ public boolean initDriver() {
       return false;
     }
     setInitialized(true);
+    int threads = getConcurrentFilesAccessNumThreads();
+    if (threads > 0) {

Review Comment:
   Should it be >1?
   Technically 1 thread would be serial.



##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/store/driver/TestStateStoreFileSystem.java:
##########
@@ -41,16 +46,22 @@
 /**
  * Test the FileSystem (e.g., HDFS) implementation of the State Store driver.
  */
+@RunWith(Parameterized.class)
 public class TestStateStoreFileSystem extends TestStateStoreDriverBase {
 
   private static MiniDFSCluster dfsCluster;
 
-  @BeforeClass
-  public static void setupCluster() throws Exception {
-    Configuration conf = FederationStateStoreTestUtils
-        .getStateStoreConfiguration(StateStoreFileSystemImpl.class);
-    conf.set(StateStoreFileSystemImpl.FEDERATION_STORE_FS_PATH,
-        "/hdfs-federation/");
+  private final String numFsAsyncThreads;
+
+  public TestStateStoreFileSystem(String numFsAsyncThreads) {
+    this.numFsAsyncThreads = numFsAsyncThreads;
+  }
+
+  private static void setupCluster(String numFsAsyncThreads) throws Exception {
+    Configuration conf =
+        
FederationStateStoreTestUtils.getStateStoreConfiguration(StateStoreFileSystemImpl.class);
+    conf.set(StateStoreFileSystemImpl.FEDERATION_STORE_FS_PATH, 
"/hdfs-federation/");
+    conf.set(FEDERATION_STORE_FS_ASYNC_THREADS, numFsAsyncThreads);

Review Comment:
   Could we make it `setInt()` and pass the number as an int.
   It would be cleaner.
   I'm not sure how the Parameterized handles that though.



##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/driver/impl/StateStoreFileBaseImpl.java:
##########
@@ -168,9 +182,30 @@ public boolean initDriver() {
       return false;
     }
     setInitialized(true);
+    int threads = getConcurrentFilesAccessNumThreads();
+    if (threads > 0) {
+      this.concurrentStoreAccessPool =
+          new ThreadPoolExecutor(threads, threads, 0L, TimeUnit.MILLISECONDS,
+              new LinkedBlockingQueue<>(),
+              new ThreadFactoryBuilder()
+                  .setNameFormat("state-store-file-based-concurrent-%d")
+                  .setDaemon(true).build());
+      LOG.info("File based state store will be accessed concurrently with {} 
max threads", threads);
+    } else {
+      LOG.info("File based state store will be accessed serially");
+    }
     return true;
   }
 
+  @Override
+  public void close() throws Exception {
+    if (this.concurrentStoreAccessPool != null) {
+      this.concurrentStoreAccessPool.shutdown();
+      boolean isTerminated = 
this.concurrentStoreAccessPool.awaitTermination(5, TimeUnit.SECONDS);

Review Comment:
   ```
   this.concurrentStoreAccessPool = null;
   ```
   At the end?





> RBF: File based state stores should allow concurrent access to the records
> --------------------------------------------------------------------------
>
>                 Key: HDFS-16967
>                 URL: https://issues.apache.org/jira/browse/HDFS-16967
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>              Labels: pull-request-available
>
> File based state store implementations (StateStoreFileImpl and 
> StateStoreFileSystemImpl) should allow updating as well as reading of the 
> state store records concurrently rather than serially. Concurrent access to 
> the record files on the hdfs based store seems to be improving the state 
> store cache loading performance by more than 10x.
> For instance, in order to maintain data integrity, when any mount table 
> record(s) is updated, the cache is reloaded. This reload operation seems to 
> be able to gain significant performance improvement by the concurrent access 
> of the mount table records.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to