[jira] [Commented] (HBASE-24897) RegionReplicaFlushHandler should handle NoServerForRegionException to avoid aborting RegionServer

Hudson (Jira) Wed, 19 Aug 2020 17:07:19 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17180887#comment-17180887
 ]


Hudson commented on HBASE-24897:
--------------------------------

Results for branch branch-2.2
        [build #28 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/28/]:
 (x) *{color:red}-1 overall{color}*
----
details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/28//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/28//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/28//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> RegionReplicaFlushHandler should handle NoServerForRegionException to avoid 
> aborting RegionServer
> -------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-24897
>                 URL: https://issues.apache.org/jira/browse/HBASE-24897
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Guanghao Zhang
>            Assignee: Guanghao Zhang
>            Priority: Major
>
> Debug flaky test TestRegionReplicaReplicationEndpoint, I found the RS aborted 
> because RegionReplicaFlushHandler flush failed. When create a new table with 
> region replica, the assign order may be:
>  # assign 0002 replica region and trigger primary region flush.
>  # assign 0001 replica region and trigger primary region flush.
>  # assign primary region.
> But the primary region flush may failed because the primary region not opened 
> now. So it may abort the RS......
>  
> {code:java}
> 2020-08-18 16:56:30,041 INFO 
> [RS_OPEN_REGION-regionserver/hao-OptiPlex-7050:0-0] 
> handler.AssignRegionHandler(141): Opened 
> testRegionReplicaReplicationIgnoresDisabledTables_drop_false_disabledReplication_false,,1597740978463_0002.66e9757a05fbae7623cfea3369fc8354.
> 2020-08-18 16:56:30,558 INFO 
> [RS_OPEN_REGION-regionserver/hao-OptiPlex-7050:0-0] 
> handler.AssignRegionHandler(141): Opened 
> testRegionReplicaReplicationIgnoresDisabledTables_drop_false_disabledReplication_false,,1597740978463_0001.22ff45423b0f1f0e93794f673449d140.
> 2020-08-18 16:56:31,192 INFO 
> [RS_OPEN_REGION-regionserver/hao-OptiPlex-7050:0-0] 
> handler.AssignRegionHandler(141): Opened 
> testRegionReplicaReplicationIgnoresDisabledTables_drop_false_disabledReplication_false,,1597740978463.901f9cd06bbf27ef7c2d70b5af725cd2.
> 2020-08-18 16:58:53,857 ERROR 
> [RS_REGION_REPLICA_FLUSH_OPS-regionserver/hao-OptiPlex-7050:0-0] 
> helpers.MarkerIgnoringBase(159): ***** ABORTING region server 
> hao-optiplex-7050,36368,1597740961432: ServerAborting because an exception 
> was thrown *****
> org.apache.hadoop.hbase.client.NoServerForRegionException: No server address 
> listed in hbase:meta for region 
> testRegionReplicaReplicationWithReplicas_10,,1597741128945.0f541dc1a7ca64797c4cf054adb9edfb.
>  containing row 
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:926)
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:784)
>   at 
> org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:140)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:147)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getLocation(RegionAdminServiceCallable.java:98)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.prepare(RegionAdminServiceCallable.java:84)
>   at 
> org.apache.hadoop.hbase.client.FlushRegionCallable.prepare(FlushRegionCallable.java:62)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
>   at 
> org.apache.hadoop.hbase.regionserver.handler.RegionReplicaFlushHandler.triggerFlushInPrimaryRegion(RegionReplicaFlushHandler.java:129)
>   at 
> org.apache.hadoop.hbase.regionserver.handler.RegionReplicaFlushHandler.process(RegionReplicaFlushHandler.java:78)
>   at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> I thought the fix should be assign primary region firstly when enable region 
> replica featue. Will check the implmenation of region replica.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-24897) RegionReplicaFlushHandler should handle NoServerForRegionException to avoid aborting RegionServer

Reply via email to