[ 
https://issues.apache.org/jira/browse/HDFS-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-4816:
------------------------------

    Attachment: hdfs-4816-4.patch

Thanks for the review Colin. Newest patch adds a print on InterruptedException, 
my test output shows the expected interrupt during the get on the Future.

{noformat}
2013-06-26 13:49:15,797 INFO  ha.StandbyCheckpointer 
(StandbyCheckpointer.java:doWork(332)) - Interrupted during checkpointing
java.lang.InterruptedException
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:200)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1400(StandbyCheckpointer.java:61)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.doWork(StandbyCheckpointer.java:325)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.access$600(StandbyCheckpointer.java:238)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread$1.run(StandbyCheckpointer.java:258)
        at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.run(StandbyCheckpointer.java:254)
2013-06-26 13:49:15,798 WARN  ha.EditLogTailer (EditLogTailer.java:doWork(336)) 
- Edit log tailer interrupted
java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:334)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
        at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)
{noformat}
                
> transitionToActive blocks if the SBN is doing checkpoint image transfer
> -----------------------------------------------------------------------
>
>                 Key: HDFS-4816
>                 URL: https://issues.apache.org/jira/browse/HDFS-4816
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.0.0, 2.0.4-alpha
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: hdfs-4816-1.patch, hdfs-4816-2.patch, hdfs-4816-3.patch, 
> hdfs-4816-4.patch, hdfs-4816-slow-shutdown.txt, stacks.out
>
>
> The NN and SBN do this dance during checkpoint image transfer with nested 
> HTTP GETs via {{HttpURLConnection}}. When an admin does a 
> {{-transitionToActive}} during this transfer, part of that is interrupting an 
> ongoing checkpoint so we can transition immediately.
> However, the {{thread.interrupt()}} in {{StandbyCheckpointer#stop}} gets 
> swallowed by {{connection.getResponseCode()}} in 
> {{TransferFsImage#doGetUrl}}. None of the methods in HttpURLConnection throw 
> InterruptedException, so we need to do something else (perhaps HttpClient 
> [1]):
> [1]: http://hc.apache.org/httpclient-3.x/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to