[jira] [Commented] (HBASE-13136) TestSplitLogManager.testGetPreviousRecoveryMode is flakey

Hudson (JIRA) Mon, 02 Mar 2015 18:53:32 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344391#comment-14344391
 ]


Hudson commented on HBASE-13136:
--------------------------------

FAILURE: Integrated in HBase-1.1 #239 (See 
[https://builds.apache.org/job/HBase-1.1/239/])
HBASE-13136 TestSplitLogManager.testGetPreviousRecoveryMode is flakey (stack: 
rev 10d5236e665ebdc4f8100c0fca81923f8e69f37b)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/coordination/ZKSplitLogManagerCoordination.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKSplitLog.java


> TestSplitLogManager.testGetPreviousRecoveryMode is flakey
> ---------------------------------------------------------
>
>                 Key: HBASE-13136
>                 URL: https://issues.apache.org/jira/browse/HBASE-13136
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 1.1.0
>            Reporter: zhangduo
>            Assignee: zhangduo
>         Attachments: HBASE-13136.patch
>
>
> Add test code to run it 100 times then we can make it fail always.
> {code:title=TestSplitLogManager.java}
>   @Test
>   public void test() throws Exception {
>     for (int i = 0; i < 100; i++) {
>       setup();
>       testGetPreviousRecoveryMode();
>       teardown();
>     }
>   }
> {code}
> Add then add some ugly debug logs(Yeah I usually debug in this way...)
> {code:title=ZKSplitLogManagerCoordination.java}
>   @Override
>   public void setRecoveryMode(boolean isForInitialization) throws IOException 
> {
>     synchronized(this) {
>       if (this.isDrainingDone) {
>         // when there is no outstanding splitlogtask after master start up, 
> we already have up to 
>         // date recovery mode
>         return;
>       }
>     }
>     if (this.watcher == null) {
>       // when watcher is null(testing code) and recovery mode can only be 
> LOG_SPLITTING
>       synchronized(this) {
>         this.isDrainingDone = true;
>         this.recoveryMode = RecoveryMode.LOG_SPLITTING;
>       }
>       return;
>     }
>     boolean hasSplitLogTask = false;
>     boolean hasRecoveringRegions = false;
>     RecoveryMode previousRecoveryMode = RecoveryMode.UNKNOWN;
>     RecoveryMode recoveryModeInConfig =
>         (isDistributedLogReplay(conf)) ? RecoveryMode.LOG_REPLAY : 
> RecoveryMode.LOG_SPLITTING;
>     // Firstly check if there are outstanding recovering regions
>     try {
>       List<String> regions = ZKUtil.listChildrenNoWatch(watcher, 
> watcher.recoveringRegionsZNode);
>       LOG.debug("=======" + regions);
>       if (regions != null && !regions.isEmpty()) {
>         hasRecoveringRegions = true;
>         previousRecoveryMode = RecoveryMode.LOG_REPLAY;
>       }
>       if (previousRecoveryMode == RecoveryMode.UNKNOWN) {
>         // Secondly check if there are outstanding split log task
>         List<String> tasks = ZKUtil.listChildrenNoWatch(watcher, 
> watcher.splitLogZNode);
>         LOG.debug("=======" + tasks);
>         if (tasks != null && !tasks.isEmpty()) {
>           hasSplitLogTask = true;
>           if (isForInitialization) {
>             // during initialization, try to get recovery mode from 
> splitlogtask
>             int listSize = tasks.size();
>             for (int i = 0; i < listSize; i++) {
>               String task = tasks.get(i);
>               try {
>                 byte[] data =
>                     ZKUtil.getData(this.watcher, 
> ZKUtil.joinZNode(watcher.splitLogZNode, task));
>                 if (data == null) continue;
>                 SplitLogTask slt = SplitLogTask.parseFrom(data);
>                 previousRecoveryMode = slt.getMode();
>                 if (previousRecoveryMode == RecoveryMode.UNKNOWN) {
>                   // created by old code base where we don't set recovery 
> mode in splitlogtask
>                   // we can safely set to LOG_SPLITTING because we're in 
> master initialization code
>                   // before SSH is enabled & there is no outstanding 
> recovering regions
>                   previousRecoveryMode = RecoveryMode.LOG_SPLITTING;
>                 }
>                 break;
>               } catch (DeserializationException e) {
>                 LOG.warn("Failed parse data for znode " + task, e);
>               } catch (InterruptedException e) {
>                 throw new InterruptedIOException();
>               }
>             }
>           }
>         }
>       }
>     } catch (KeeperException e) {
>       throw new IOException(e);
>     }
>     synchronized (this) {
>       if (this.isDrainingDone) {
>         return;
>       }
>       if (!hasSplitLogTask && !hasRecoveringRegions) {
>         this.isDrainingDone = true;
>         LOG.debug("====set to " + recoveryModeInConfig);
>         this.recoveryMode = recoveryModeInConfig;
>         return;
>       } else if (!isForInitialization) {
>         // splitlogtask hasn't drained yet, keep existing recovery mode
>         return;
>       }
>       if (previousRecoveryMode != RecoveryMode.UNKNOWN) {
>         LOG.debug("====set to " + previousRecoveryMode);
>         this.isDrainingDone = (previousRecoveryMode == recoveryModeInConfig);
>         this.recoveryMode = previousRecoveryMode;
>       } else {
>         LOG.debug("====set to " + recoveryModeInConfig);
>         this.recoveryMode = recoveryModeInConfig;
>       }
>     }
>   }
> {code}
> When failing, I got this
> {noformat}
> 2015-03-02 12:26:12,555 INFO  [main] master.TestSplitLogManager(654): 
> testGetPreviousRecoveryMode
> 2015-03-02 12:26:12,600 DEBUG [main-EventThread] 
> zookeeper.ZooKeeperWatcher(388): 
> split-log-manager-tests465da653-db4c-475b-bf73-91ac0ca13323-0x14bd8bbf5d90000,
>  quorum=localhost:53589, baseZNode=/hbase Received ZooKeeper Event, 
> type=None, state=Disconnected, path=null
> 2015-03-02 12:26:12,600 DEBUG [main-EventThread] 
> zookeeper.ZooKeeperWatcher(477): 
> split-log-manager-tests465da653-db4c-475b-bf73-91ac0ca13323-0x14bd8bbf5d90000,
>  quorum=localhost:53589, baseZNode=/hbase Received Disconnected from 
> ZooKeeper, ignoring
> 2015-03-02 12:26:12,621 DEBUG [main] 
> coordination.ZKSplitLogManagerCoordination(875): Distributed log replay=true
> 2015-03-02 12:26:12,626 DEBUG [main] 
> coordination.ZKSplitLogManagerCoordination(798): =======null
> 2015-03-02 12:26:12,627 DEBUG [main] 
> coordination.ZKSplitLogManagerCoordination(806): =======[testRecovery]
> 2015-03-02 12:26:12,628 DEBUG [main] 
> coordination.ZKSplitLogManagerCoordination(855): ====set to LOG_SPLITTING
> 2015-03-02 12:26:12,629 INFO  [main] 
> coordination.ZKSplitLogManagerCoordination(594): found orphan task 
> testRecovery
> 2015-03-02 12:26:12,630 INFO  [main] 
> coordination.ZKSplitLogManagerCoordination(598): Found 1 orphan tasks and 0 
> rescan nodes
> 2015-03-02 12:26:12,631 DEBUG [main-EventThread] 
> coordination.ZKSplitLogManagerCoordination(464): task not yet acquired 
> /hbase/splitWAL/testRecovery ver = 0
> 2015-03-02 12:26:12,631 INFO  [main-EventThread] 
> coordination.ZKSplitLogManagerCoordination(548): creating orphan task 
> /hbase/splitWAL/testRecovery
> 2015-03-02 12:26:12,631 INFO  [main-EventThread] 
> coordination.ZKSplitLogManagerCoordination(178): resubmitting unassigned 
> orphan task /hbase/splitWAL/testRecovery
> 2015-03-02 12:26:12,631 INFO  [main-EventThread] 
> coordination.ZKSplitLogManagerCoordination(229): resubmitting task 
> /hbase/splitWAL/testRecovery
> 2015-03-02 12:26:12,632 INFO  [main] master.TestSplitLogManager(665): 
> Mode1=LOG_SPLITTING
> 2015-03-02 12:26:12,633 DEBUG [main-EventThread] 
> zookeeper.ZooKeeperWatcher(388): 
> split-log-manager-tests9317eabf-735c-4a9d-bc5b-8a9269bb6d62-0x14bd8bbf8be0000,
>  quorum=localhost:55209, baseZNode=/hbase Received ZooKeeper Event, 
> type=NodeDataChanged, state=SyncConnected, path=/hbase/splitWAL/testRecovery
> 2015-03-02 12:26:12,639 INFO  [main] master.TestSplitLogManager(668): 
> Mode2=LOG_SPLITTING
> 2015-03-02 12:26:12,640 DEBUG [main] 
> coordination.ZKSplitLogManagerCoordination(875): Distributed log replay=true
> 2015-03-02 12:26:12,641 WARN  [main-EventThread] 
> coordination.ZKSplitLogManagerCoordination$GetDataAsyncCallback(1000): task 
> znode /hbase/splitWAL/testRecovery vanished or not created yet.
> 2015-03-02 12:26:12,641 DEBUG [main] 
> coordination.ZKSplitLogManagerCoordination(798): =======null
> 2015-03-02 12:26:12,642 INFO  [main-EventThread] 
> coordination.ZKSplitLogManagerCoordination(472): task 
> /hbase/splitWAL/RESCAN0000000001 entered state: DONE dummy-master,1,1
> 2015-03-02 12:26:12,642 DEBUG [main] 
> coordination.ZKSplitLogManagerCoordination(806): =======[RESCAN0000000001]
> 2015-03-02 12:26:12,642 INFO  [main] master.TestSplitLogManager(670): 
> Mode3=LOG_SPLITTING
> 2015-03-02 12:26:12,642 INFO  [main] hbase.ChoreService(303): Chore service 
> for: dummy-master,1,1_splitLogManager_ had 
> [java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@35cd95e8]
>  on shutdown
> 2015-03-02 12:26:12,643 INFO  [main] zookeeper.MiniZooKeeperCluster(244): 
> Shutdown MiniZK cluster with all ZK servers
> {noformat}
> We got some tasks here and cause hasSplitLogTask to be true.
> {code:title=}
>         List<String> tasks = ZKUtil.listChildrenNoWatch(watcher, 
> watcher.splitLogZNode);
>         LOG.debug("=======" + tasks);
>         if (tasks != null && !tasks.isEmpty()) {
>           hasSplitLogTask = true;
> {code}
> So we skipped the mode setting stage and returned.
> Not sure if it is a race condition in ZKSplitLogManagerCoordination or just a 
> unit test issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13136) TestSplitLogManager.testGetPreviousRecoveryMode is flakey

Reply via email to