[jira] [Commented] (HDFS-17049) EC: Fix duplicate block group IDs generated by SequentialBlockGroupIdGenerator

ASF GitHub Bot (Jira) Wed, 14 Jun 2023 01:17:04 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-17049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732439#comment-17732439
 ]


ASF GitHub Bot commented on HDFS-17049:
---------------------------------------

zhangshuyan0 commented on code in PR #5743:
URL: https://github.com/apache/hadoop/pull/5743#discussion_r1229203981


##########
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestSequentialBlockGroupId.java:
##########
@@ -172,6 +176,41 @@ public void testTriggerBlockGroupIdCollision() throws 
IOException {
     }
   }
 
+  /**
+   * Test that the values generated by blockGroup ID generator are unique,
+   * even if they are generated concurrently.
+   * @throws Exception
+   */
+  @Test
+  public void testBlockGroupIdThreadSafety() throws Exception {
+    List<List<Long>> blockIds = new ArrayList<>();
+    List<Thread> threads = new ArrayList<>();
+    for (int i = 0; i < 20; i++) {
+      blockIds.add(new ArrayList<>());
+      threads.add(new Thread(() -> {
+        for (int j = 0; j < 1000; j++) {
+          long next = blockGrpIdGenerator.nextValue();
+          blockIds.get(j).add(next);
+        }
+      }));
+    }
+    for (Thread t : threads) {
+      t.start();
+    }
+    for (Thread t : threads) {
+      t.join();
+    }
+    Set<Long> allBlockIds = new HashSet<>();

Review Comment:
   Sorry I'm not understand how to use `HashMap` to get a more simple 
implement. I just modified the code as you did above. 





> EC: Fix duplicate block group IDs generated by SequentialBlockGroupIdGenerator
> ------------------------------------------------------------------------------
>
>                 Key: HDFS-17049
>                 URL: https://issues.apache.org/jira/browse/HDFS-17049
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Shuyan Zhang
>            Assignee: Shuyan Zhang
>            Priority: Major
>              Labels: pull-request-available
>
> When I used multiple clients to write EC files concurrently, I found that 
> NameNode generated the same block group ID for different files:
> ```
> 2023-06-13 20:09:59,514 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> allocate blk_-9223372036854697568_14389 for /ec-test/10/4068034329705654124
> 2023-06-13 20:09:59,514 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> allocate blk_-9223372036854697568_14390 for /ec-test/19/7042966144171770731
> ```
> After diving into `SequentialBlockGroupIdGenerator`, I found that the current 
> implementation of `nextValue` is not thread-safe.
> This problem must be fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-17049) EC: Fix duplicate block group IDs generated by SequentialBlockGroupIdGenerator

Reply via email to