[
https://issues.apache.org/jira/browse/HDFS-16967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17707432#comment-17707432
]
ASF GitHub Bot commented on HDFS-16967:
---------------------------------------
goiri commented on code in PR #5523:
URL: https://github.com/apache/hadoop/pull/5523#discussion_r1154947181
##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/driver/impl/StateStoreFileBaseImpl.java:
##########
@@ -168,9 +182,30 @@ public boolean initDriver() {
return false;
}
setInitialized(true);
+ int threads = getConcurrentFilesAccessNumThreads();
+ if (threads > 0) {
Review Comment:
Should it be >1?
Technically 1 thread would be serial.
##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/store/driver/TestStateStoreFileSystem.java:
##########
@@ -41,16 +46,22 @@
/**
* Test the FileSystem (e.g., HDFS) implementation of the State Store driver.
*/
+@RunWith(Parameterized.class)
public class TestStateStoreFileSystem extends TestStateStoreDriverBase {
private static MiniDFSCluster dfsCluster;
- @BeforeClass
- public static void setupCluster() throws Exception {
- Configuration conf = FederationStateStoreTestUtils
- .getStateStoreConfiguration(StateStoreFileSystemImpl.class);
- conf.set(StateStoreFileSystemImpl.FEDERATION_STORE_FS_PATH,
- "/hdfs-federation/");
+ private final String numFsAsyncThreads;
+
+ public TestStateStoreFileSystem(String numFsAsyncThreads) {
+ this.numFsAsyncThreads = numFsAsyncThreads;
+ }
+
+ private static void setupCluster(String numFsAsyncThreads) throws Exception {
+ Configuration conf =
+
FederationStateStoreTestUtils.getStateStoreConfiguration(StateStoreFileSystemImpl.class);
+ conf.set(StateStoreFileSystemImpl.FEDERATION_STORE_FS_PATH,
"/hdfs-federation/");
+ conf.set(FEDERATION_STORE_FS_ASYNC_THREADS, numFsAsyncThreads);
Review Comment:
Could we make it `setInt()` and pass the number as an int.
It would be cleaner.
I'm not sure how the Parameterized handles that though.
##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/driver/impl/StateStoreFileBaseImpl.java:
##########
@@ -168,9 +182,30 @@ public boolean initDriver() {
return false;
}
setInitialized(true);
+ int threads = getConcurrentFilesAccessNumThreads();
+ if (threads > 0) {
+ this.concurrentStoreAccessPool =
+ new ThreadPoolExecutor(threads, threads, 0L, TimeUnit.MILLISECONDS,
+ new LinkedBlockingQueue<>(),
+ new ThreadFactoryBuilder()
+ .setNameFormat("state-store-file-based-concurrent-%d")
+ .setDaemon(true).build());
+ LOG.info("File based state store will be accessed concurrently with {}
max threads", threads);
+ } else {
+ LOG.info("File based state store will be accessed serially");
+ }
return true;
}
+ @Override
+ public void close() throws Exception {
+ if (this.concurrentStoreAccessPool != null) {
+ this.concurrentStoreAccessPool.shutdown();
+ boolean isTerminated =
this.concurrentStoreAccessPool.awaitTermination(5, TimeUnit.SECONDS);
Review Comment:
```
this.concurrentStoreAccessPool = null;
```
At the end?
> RBF: File based state stores should allow concurrent access to the records
> --------------------------------------------------------------------------
>
> Key: HDFS-16967
> URL: https://issues.apache.org/jira/browse/HDFS-16967
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Viraj Jasani
> Assignee: Viraj Jasani
> Priority: Major
> Labels: pull-request-available
>
> File based state store implementations (StateStoreFileImpl and
> StateStoreFileSystemImpl) should allow updating as well as reading of the
> state store records concurrently rather than serially. Concurrent access to
> the record files on the hdfs based store seems to be improving the state
> store cache loading performance by more than 10x.
> For instance, in order to maintain data integrity, when any mount table
> record(s) is updated, the cache is reloaded. This reload operation seems to
> be able to gain significant performance improvement by the concurrent access
> of the mount table records.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]