[
https://issues.apache.org/jira/browse/HDFS-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Wang updated HDFS-4816:
------------------------------
Attachment: hdfs-4816-4.patch
Thanks for the review Colin. Newest patch adds a print on InterruptedException,
my test output shows the expected interrupt during the get on the Future.
{noformat}
2013-06-26 13:49:15,797 INFO ha.StandbyCheckpointer
(StandbyCheckpointer.java:doWork(332)) - Interrupted during checkpointing
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:200)
at
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1400(StandbyCheckpointer.java:61)
at
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.doWork(StandbyCheckpointer.java:325)
at
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.access$600(StandbyCheckpointer.java:238)
at
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread$1.run(StandbyCheckpointer.java:258)
at
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
at
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.run(StandbyCheckpointer.java:254)
2013-06-26 13:49:15,798 WARN ha.EditLogTailer (EditLogTailer.java:doWork(336))
- Edit log tailer interrupted
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:334)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
at
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)
{noformat}
> transitionToActive blocks if the SBN is doing checkpoint image transfer
> -----------------------------------------------------------------------
>
> Key: HDFS-4816
> URL: https://issues.apache.org/jira/browse/HDFS-4816
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 3.0.0, 2.0.4-alpha
> Reporter: Andrew Wang
> Assignee: Andrew Wang
> Attachments: hdfs-4816-1.patch, hdfs-4816-2.patch, hdfs-4816-3.patch,
> hdfs-4816-4.patch, hdfs-4816-slow-shutdown.txt, stacks.out
>
>
> The NN and SBN do this dance during checkpoint image transfer with nested
> HTTP GETs via {{HttpURLConnection}}. When an admin does a
> {{-transitionToActive}} during this transfer, part of that is interrupting an
> ongoing checkpoint so we can transition immediately.
> However, the {{thread.interrupt()}} in {{StandbyCheckpointer#stop}} gets
> swallowed by {{connection.getResponseCode()}} in
> {{TransferFsImage#doGetUrl}}. None of the methods in HttpURLConnection throw
> InterruptedException, so we need to do something else (perhaps HttpClient
> [1]):
> [1]: http://hc.apache.org/httpclient-3.x/
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira