[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork

Hadoop QA (JIRA) Fri, 03 Oct 2014 15:43:58 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158621#comment-14158621
 ]


Hadoop QA commented on HBASE-12166:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672838/hbase-12166.patch
  against trunk revision .
  ATTACHMENT ID: 12672838

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
                        Please justify why no new tests are needed for this 
patch.
                        Also please list what manual steps were performed to 
verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.coprocessor.TestMasterObserver
                  
org.apache.hadoop.hbase.replication.regionserver.TestRegionReplicaReplicationEndpoint

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11211//console

This message is automatically generated.

> TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
> ---------------------------------------------------------------
>
>                 Key: HBASE-12166
>                 URL: https://issues.apache.org/jira/browse/HBASE-12166
>             Project: HBase
>          Issue Type: Bug
>          Components: test, wal
>            Reporter: stack
>            Assignee: Jimmy Xiang
>             Fix For: 2.0.0, 0.99.1
>
>         Attachments: 12166.txt, hbase-12166.patch, log.txt
>
>
> See 
> https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
> The namespace region gets stuck.  It is never 'recovered' even though we have 
> finished log splitting.  Here is the main exception:
> {code}
> 4941 2014-10-03 02:00:36,862 DEBUG 
> [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
> B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
> ClientService methodName: Get
>       size: 99 connection: 67.195.81.144:44526
> 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
> hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
> 4943   at 
> org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
> 4944   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
> 4945   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
> 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
> 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
> 4948   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
> 4949   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
> 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
> 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> 4952   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> 4954   at java.lang.Thread.run(Thread.java:744)  
> {code}
> See how we've finished log splitting long time previous:
> {code}
> 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
> master.SplitLogManager(294): finished splitting (more than or equal to) 
> 197337 bytes in 1 log files in 
> [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
>  in 379ms
> {code}
> If I grep for the deleting of znodes on recovery, which is when we set the 
> recovering flag to false, I see a bunch of regions but not my namespace one:
> 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
> znode deleted. Region: 1588230740 completes recovery.
> 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
> Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
> 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
> Region: 41d438848305831b61d708a406d5ecde completes recovery.
> 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
> Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
> 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
> Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
> 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
> Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
> 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
> Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
> 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 
> Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery.
> This would seem to indicate that we successfully wrote zk that we are 
> recovering:
> {code}
> 2014-10-03 01:57:47,672 DEBUG [MASTER_SERVER_OPERATIONS-asf900:37113-0] 
> coordination.ZKSplitLogManagerCoordination(647): Mark region 
> eba5d23de65f2718715eeb22edf7edc2 recovering from failed region server 
> asf900.gq1.ygridcore.net,42071,1412301461790
> {code}
> As part of the open of namespace, we updated our last flushed id successfully:
> {code}
> 2ae5d774fe8cd33bd4cda/family
> 2014-10-03 01:57:47,698 DEBUG [Thread-9216-EventThread] 
> zookeeper.ZooKeeperWatcher(304): master:37113-0x148d3b7d84b020f, 
> quorum=localhost:50702, baseZNode=/hbase Received ZooKeeper Event, 
> type=NodeDataChanged, state=SyncConnected, 
> path=/hbase/recovering-regions/eba5d23de65f2718715eeb22edf7edc2
> 2014-10-03 01:57:47,699 DEBUG 
> [PostOpenDeployTasks:eba5d23de65f2718715eeb22edf7edc2] 
> regionserver.HRegionServer(2947): Update last flushed sequence id of region 
> eba5d23de65f2718715eeb22edf7edc2 for 
> asf900.gq1.ygridcore.net,42071,1412301461790
> {code}
> Not sure why this node is up in zk.
> Let me dig again later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork

Reply via email to