[ https://issues.apache.org/jira/browse/HDFS-14618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875251#comment-16875251 ]
Paul Ward commented on HDFS-14618: ---------------------------------- Hi Anu, Thank you for assigning me this. I am not familiar with Hadoop internals to debug those tests. However, note that the patch grabs the locks in the same order as here: [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/PendingReconstructionBlocks.java#L257-L267] I.e., this patch does *not* introduce a deadlock. Beyond that, I don't know why this patch would cause anything to fail. Can you please take a look? Thanks > Incorrect synchronization of ArrayList field (ArrayList is thread-unsafe). > -------------------------------------------------------------------------- > > Key: HDFS-14618 > URL: https://issues.apache.org/jira/browse/HDFS-14618 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Paul Ward > Assignee: Paul Ward > Priority: Critical > Labels: fix-provided, patch-available > Attachments: race.patch > > > I submitted a CR for this issue at: > https://github.com/apache/hadoop/pull/1030 > The field {{timedOutItems}} (an {{ArrayList}}, i.e., not thread safe): > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/PendingReconstructionBlocks.java#L70 > is protected by synchronization on itself ({{timedOutItems}}): > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/PendingReconstructionBlocks.java#L167-L168 > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/PendingReconstructionBlocks.java#L267-L268 > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/PendingReconstructionBlocks.java#L178 > However, in one place: > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/PendingReconstructionBlocks.java#L133-L135 > it is (trying to be) protected by synchronized using > {{pendingReconstructions}} --- but this cannot protect {{timedOutItems}}. > Synchronized on different objects does not ensure mutual exclusion with the > other locations. > I.e., 2 code locations, one synchronized by {{pendingReconstructions}} and > the other by {{timedOutItems}} can still executed concurrently. > This CR adds the synchronized on {{timedOutItems}}. > Note that this CR keeps the synchronized on {{pendingReconstructions}}, which > is needed for a different purpose (protect {{pendingReconstructions}}) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org