[ 
https://issues.apache.org/jira/browse/HBASE-25829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-25829:
----------------------------------------
    Description: 
Seen after an integration test load with 'calm' monkey, so this happened in the 
happy path.

There were no errors accessing all loaded table data. The integration test 
writes a log to HDFS of every cell written to HBase and the verify phase uses 
that log to read each value and confirm it. That seems fine:
{noformat}
2021-04-30 02:16:33,316 INFO  [main] 
test.IntegrationTestLoadCommonCrawl$Verify: REFERENCED: 154943544
2021-04-30 02:16:33,316 INFO  [main] 
test.IntegrationTestLoadCommonCrawl$Verify: UNREFERENCED: 0
2021-04-30 02:16:33,316 INFO  [main] 
test.IntegrationTestLoadCommonCrawl$Verify: CORRUPT: 0
{noformat}

However whenever the balancer runs there are a number of concerning INFO level 
log messages printed of the form _assignment.RegionStates: Skipping, no server 
for state=SPLIT, location=null, table=TABLENAME_ 

For example:
{noformat}
2021-04-30 02:02:09,286 INFO  [master/ip-172-31-58-47:8100.Chore.2] 
assignment.RegionStates: Skipping, no server for state=SPLIT, location=null, 
table=IntegrationTestLoadCommonCrawl, region=087fb2f7847c2fc0a0b85eb30a97036e
2021-04-30 02:02:09,286 INFO  [master/ip-172-31-58-47:8100.Chore.2] 
assignment.RegionStates: Skipping, no server for state=SPLIT, location=null, 
table=IntegrationTestLoadCommonCrawl, region=0952b94a920454afe9c40becbb7bf205
2021-04-30 02:02:09,286 INFO  [master/ip-172-31-58-47:8100.Chore.2] 
assignment.RegionStates: Skipping, no server for state=SPLIT, location=null, 
table=IntegrationTestLoadCommonCrawl, region=f87a8b993f7eca2524bf2331b7ee3c06
2021-04-30 02:02:09,286 INFO  [master/ip-172-31-58-47:8100.Chore.2] 
assignment.RegionStates: Skipping, no server for state=SPLIT, location=null, 
table=IntegrationTestLoadCommonCrawl, region=74bb28864a120decdf0f4956741df745
2021-04-30 02:02:09,286 INFO  [master/ip-172-31-58-47:8100.Chore.2] 
assignment.RegionStates: Skipping, no server for state=SPLIT, location=null, 
table=IntegrationTestLoadCommonCrawl, region=bc918b609ade0ae4d5530f0467354cae
2021-04-30 02:02:09,286 INFO  [master/ip-172-31-58-47:8100.Chore.2] 
assignment.RegionStates: Skipping, no server for state=SPLIT, location=null, 
table=IntegrationTestLoadCommonCrawl, region=183a199984539f3917a2f8927fe01572
2021-04-30 02:02:09,286 INFO  [master/ip-172-31-58-47:8100.Chore.2] 
assignment.RegionStates: Skipping, no server for state=SPLIT, location=null, 
table=IntegrationTestLoadCommonCrawl, region=6cc5ce4fb4adc00445b3ec7dd8760ba8
{noformat}

The HBCK chore notices them but does nothing:

"Loaded *80 regions* from in-memory state of AssignmentManager"

"Loaded *73 regions from 5 regionservers' reports* and found 0 orphan regions"

"Loaded 3 tables 80 regions from filesystem and found 0 orphan regions"

Yes, there are exactly 7 region state records of SPLIT state with server=null. 

{noformat}
2021-04-30 02:02:09,300 INFO  [master/ip-172-31-58-47:8100.Chore.1] 
master.HbckChore: Loaded 80 regions from in-memory state of AssignmentManager
2021-04-30 02:02:09,300 INFO  [master/ip-172-31-58-47:8100.Chore.1] 
master.HbckChore: Loaded 73 regions from 5 regionservers' reports and found 0 
orphan regions
2021-04-30 02:02:09,306 INFO  [master/ip-172-31-58-47:8100.Chore.1] 
master.HbckChore: Loaded 3 tables 80 regions from filesystem and found 0 orphan 
regions
{noformat}

This repeats indefinitely. 

  was:
Seen after an integration test load with 'calm' monkey, so this happened in the 
happy path.

There were no errors accessing all loaded table data, however whenever the 
balancer runs there are a number of concerning INFO level log messages printed 
of the form _assignment.RegionStates: Skipping, no server for state=SPLIT, 
location=null, table=TABLENAME_ 

For example:
{noformat}
2021-04-30 02:02:09,286 INFO  [master/ip-172-31-58-47:8100.Chore.2] 
assignment.RegionStates: Skipping, no server for state=SPLIT, location=null, 
table=IntegrationTestLoadCommonCrawl, region=087fb2f7847c2fc0a0b85eb30a97036e
2021-04-30 02:02:09,286 INFO  [master/ip-172-31-58-47:8100.Chore.2] 
assignment.RegionStates: Skipping, no server for state=SPLIT, location=null, 
table=IntegrationTestLoadCommonCrawl, region=0952b94a920454afe9c40becbb7bf205
2021-04-30 02:02:09,286 INFO  [master/ip-172-31-58-47:8100.Chore.2] 
assignment.RegionStates: Skipping, no server for state=SPLIT, location=null, 
table=IntegrationTestLoadCommonCrawl, region=f87a8b993f7eca2524bf2331b7ee3c06
2021-04-30 02:02:09,286 INFO  [master/ip-172-31-58-47:8100.Chore.2] 
assignment.RegionStates: Skipping, no server for state=SPLIT, location=null, 
table=IntegrationTestLoadCommonCrawl, region=74bb28864a120decdf0f4956741df745
2021-04-30 02:02:09,286 INFO  [master/ip-172-31-58-47:8100.Chore.2] 
assignment.RegionStates: Skipping, no server for state=SPLIT, location=null, 
table=IntegrationTestLoadCommonCrawl, region=bc918b609ade0ae4d5530f0467354cae
2021-04-30 02:02:09,286 INFO  [master/ip-172-31-58-47:8100.Chore.2] 
assignment.RegionStates: Skipping, no server for state=SPLIT, location=null, 
table=IntegrationTestLoadCommonCrawl, region=183a199984539f3917a2f8927fe01572
2021-04-30 02:02:09,286 INFO  [master/ip-172-31-58-47:8100.Chore.2] 
assignment.RegionStates: Skipping, no server for state=SPLIT, location=null, 
table=IntegrationTestLoadCommonCrawl, region=6cc5ce4fb4adc00445b3ec7dd8760ba8
{noformat}

The HBCK chore notices them but does nothing:

"Loaded *80 regions* from in-memory state of AssignmentManager"

"Loaded *73 regions from 5 regionservers' reports* and found 0 orphan regions"

"Loaded 3 tables 80 regions from filesystem and found 0 orphan regions"

Yes, there are exactly 7 region state records of SPLIT state with server=null. 

{noformat}
2021-04-30 02:02:09,300 INFO  [master/ip-172-31-58-47:8100.Chore.1] 
master.HbckChore: Loaded 80 regions from in-memory state of AssignmentManager
2021-04-30 02:02:09,300 INFO  [master/ip-172-31-58-47:8100.Chore.1] 
master.HbckChore: Loaded 73 regions from 5 regionservers' reports and found 0 
orphan regions
2021-04-30 02:02:09,306 INFO  [master/ip-172-31-58-47:8100.Chore.1] 
master.HbckChore: Loaded 3 tables 80 regions from filesystem and found 0 orphan 
regions
{noformat}

This repeats indefinitely. 


> SPLIT state detritus
> --------------------
>
>                 Key: HBASE-25829
>                 URL: https://issues.apache.org/jira/browse/HBASE-25829
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.4.3
>            Reporter: Andrew Kyle Purtell
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> Seen after an integration test load with 'calm' monkey, so this happened in 
> the happy path.
> There were no errors accessing all loaded table data. The integration test 
> writes a log to HDFS of every cell written to HBase and the verify phase uses 
> that log to read each value and confirm it. That seems fine:
> {noformat}
> 2021-04-30 02:16:33,316 INFO  [main] 
> test.IntegrationTestLoadCommonCrawl$Verify: REFERENCED: 154943544
> 2021-04-30 02:16:33,316 INFO  [main] 
> test.IntegrationTestLoadCommonCrawl$Verify: UNREFERENCED: 0
> 2021-04-30 02:16:33,316 INFO  [main] 
> test.IntegrationTestLoadCommonCrawl$Verify: CORRUPT: 0
> {noformat}
> However whenever the balancer runs there are a number of concerning INFO 
> level log messages printed of the form _assignment.RegionStates: Skipping, no 
> server for state=SPLIT, location=null, table=TABLENAME_ 
> For example:
> {noformat}
> 2021-04-30 02:02:09,286 INFO  [master/ip-172-31-58-47:8100.Chore.2] 
> assignment.RegionStates: Skipping, no server for state=SPLIT, location=null, 
> table=IntegrationTestLoadCommonCrawl, region=087fb2f7847c2fc0a0b85eb30a97036e
> 2021-04-30 02:02:09,286 INFO  [master/ip-172-31-58-47:8100.Chore.2] 
> assignment.RegionStates: Skipping, no server for state=SPLIT, location=null, 
> table=IntegrationTestLoadCommonCrawl, region=0952b94a920454afe9c40becbb7bf205
> 2021-04-30 02:02:09,286 INFO  [master/ip-172-31-58-47:8100.Chore.2] 
> assignment.RegionStates: Skipping, no server for state=SPLIT, location=null, 
> table=IntegrationTestLoadCommonCrawl, region=f87a8b993f7eca2524bf2331b7ee3c06
> 2021-04-30 02:02:09,286 INFO  [master/ip-172-31-58-47:8100.Chore.2] 
> assignment.RegionStates: Skipping, no server for state=SPLIT, location=null, 
> table=IntegrationTestLoadCommonCrawl, region=74bb28864a120decdf0f4956741df745
> 2021-04-30 02:02:09,286 INFO  [master/ip-172-31-58-47:8100.Chore.2] 
> assignment.RegionStates: Skipping, no server for state=SPLIT, location=null, 
> table=IntegrationTestLoadCommonCrawl, region=bc918b609ade0ae4d5530f0467354cae
> 2021-04-30 02:02:09,286 INFO  [master/ip-172-31-58-47:8100.Chore.2] 
> assignment.RegionStates: Skipping, no server for state=SPLIT, location=null, 
> table=IntegrationTestLoadCommonCrawl, region=183a199984539f3917a2f8927fe01572
> 2021-04-30 02:02:09,286 INFO  [master/ip-172-31-58-47:8100.Chore.2] 
> assignment.RegionStates: Skipping, no server for state=SPLIT, location=null, 
> table=IntegrationTestLoadCommonCrawl, region=6cc5ce4fb4adc00445b3ec7dd8760ba8
> {noformat}
> The HBCK chore notices them but does nothing:
> "Loaded *80 regions* from in-memory state of AssignmentManager"
> "Loaded *73 regions from 5 regionservers' reports* and found 0 orphan regions"
> "Loaded 3 tables 80 regions from filesystem and found 0 orphan regions"
> Yes, there are exactly 7 region state records of SPLIT state with 
> server=null. 
> {noformat}
> 2021-04-30 02:02:09,300 INFO  [master/ip-172-31-58-47:8100.Chore.1] 
> master.HbckChore: Loaded 80 regions from in-memory state of AssignmentManager
> 2021-04-30 02:02:09,300 INFO  [master/ip-172-31-58-47:8100.Chore.1] 
> master.HbckChore: Loaded 73 regions from 5 regionservers' reports and found 0 
> orphan regions
> 2021-04-30 02:02:09,306 INFO  [master/ip-172-31-58-47:8100.Chore.1] 
> master.HbckChore: Loaded 3 tables 80 regions from filesystem and found 0 
> orphan regions
> {noformat}
> This repeats indefinitely. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to