[ 
https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16292062#comment-16292062
 ] 

stack commented on HBASE-18946:
-------------------------------

Failure was TestMasterFailover. It failed to write its xml because it timed out 
twice. Looking that the test, it tries to set zk node states and move meta 
regions off master, neither of which makes sense in AMv2. I refactored the 
TestMasterFailover test that does nonesense.

I see other timeouts though it looks like most other tests just pass. Here is 
what I see in console:

TestDLSFSHLog
TestStochasticLoadBalancer
TestReplicationZKNodeCleaner
TestLogsCleaner

These all pass locally w/o issue EXCEPT TestDLSFSHLog. It looks sick, stuck. 
Digging, indeed, its the fault of this patch. We try to keep sending state 
change messages to master as long as we can only the thread is not daemon so it 
keeps the RS up. Ugh! Fixed.

> Stochastic load balancer assigns replica regions to the same RS
> ---------------------------------------------------------------
>
>                 Key: HBASE-18946
>                 URL: https://issues.apache.org/jira/browse/HBASE-18946
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha-3
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: stack
>             Fix For: 2.0.0-beta-1
>
>         Attachments: HBASE-18946.master.001.patch, 
> HBASE-18946.master.002.patch, HBASE-18946.master.003.patch, 
> HBASE-18946.master.004.patch, HBASE-18946.master.005.patch, 
> HBASE-18946.master.006.patch, HBASE-18946.master.007.patch, 
> HBASE-18946.master.008.patch, HBASE-18946.master.009.patch, 
> HBASE-18946.master.010.patch, HBASE-18946.master.011.patch, 
> HBASE-18946.patch, HBASE-18946.patch, HBASE-18946_2.patch, 
> HBASE-18946_2.patch, HBASE-18946_simple_7.patch, HBASE-18946_simple_8.patch, 
> TestRegionReplicasWithRestartScenarios.java
>
>
> Trying out region replica and its assignment I can see that some times the 
> default LB Stocahstic load balancer assigns replica regions to the same RS. 
> This happens when we have 3 RS checked in and we have a table with 3 
> replicas. When a RS goes down then the replicas being assigned to same RS is 
> acceptable but the case when we have enough RS to assign this behaviour is 
> undesirable and does not solve the purpose of replicas. 
> [~huaxiang] and [~enis]. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to