[
https://issues.apache.org/jira/browse/HDFS-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907920#comment-13907920
]
Andrew Wang commented on HDFS-5064:
-----------------------------------
Hi ATM, I looked at this patch. It needs a small rebase for the lock fairness
change, but I was still able to review. I have just one nit: 64-bit reads are
not atomic in the current Java memory model, so we need to slap a volatile on
{{NNStorage#mostRecentCheckpointId}} since the getter is no longer synchronized.
At a high-level, this makes sense to me as an intermediate solution for the
specific issue of the SbNN and checkpointing, until we actually separate out
block management from the namespace. Kihwal, do you have any reservations about
this approach?
Otherwise, I'm +1 for this change pending rebase and Jenkins.
> Standby checkpoints should not block concurrent readers
> -------------------------------------------------------
>
> Key: HDFS-5064
> URL: https://issues.apache.org/jira/browse/HDFS-5064
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ha, namenode
> Affects Versions: 2.3.0
> Reporter: Aaron T. Myers
> Assignee: Aaron T. Myers
> Attachments: HDFS-5064.patch
>
>
> We've observed an issue which causes fetches of the {{/jmx}} page of the NN
> to take a long time to load when the standby is in the process of creating a
> checkpoint.
> Even though both creating the checkpoint and gathering the statistics for
> {{/jmx}} take only the FSNS read lock, the issue is that since the FSNS uses
> a _fair_ RW lock, a single writer attempting to get the lock will block all
> threads attempting to get only the read lock for the duration of the
> checkpoint. This will cause {{/jmx}}, and really any thread only attempting
> to get the read lock, to block for the duration of the checkpoint, even
> though they should be able to proceed concurrently with the checkpointing
> thread.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)