[ 
https://issues.apache.org/jira/browse/HBASE-16074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342438#comment-15342438
 ] 

Elliott Clark commented on HBASE-16074:
---------------------------------------

So we had a run like this:

{code}
REFERENCED      0       1,800,000,000   1,800,000,000
UNREFERENCED    0       76      76
{code}


That is the correct number of referenced but there shouldn't be any 
unreferenced. So we went into the logs and found:

{code}
2016-06-21 04:28:43,314 WARN [main] 
org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify: Prev is not 
set for: Y\xD3\x16t\xC5\x9D1@
{code}

That row key looks really weird. It's less than the length we would expect.

However it is the split point for a region:
{code}
IntegrationTestBigLinkedList.11,Y\xD3\x16t\xC5\x9D1@,1466506812220.15898a252e1b54728dd44a2b13fca290.
    
{code}

Going into the shell and that row does not exist.

{code}
get ''HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.3.0-fb10-SNAPSHOT, rd8d63d67152af8eed48f8863a0e13d3e71fc097c, Fri Jun 
10 16:59:00 PDT 2016

hbase(main):001:0> get 'IntegrationTestBigLinkedList.11', "Y\xD3\x16t\xC5\x9D1@"
COLUMN                                                                          
 CELL
0 row(s) in 0.3390 seconds
{code}

So that got us very worried about data loss. So we re-ran the verify step. When 
stopping the chaos monkey and letting everything settle we got a clean verify 
step.

{code}
REFERENCED      0       1,800,000,000   1,800,000,000
{code}


> ITBLL fails, reports lost big or tine families
> ----------------------------------------------
>
>                 Key: HBASE-16074
>                 URL: https://issues.apache.org/jira/browse/HBASE-16074
>             Project: HBase
>          Issue Type: Bug
>          Components: integration tests
>    Affects Versions: 1.3.0
>            Reporter: Mikhail Antonov
>            Assignee: Mikhail Antonov
>            Priority: Blocker
>             Fix For: 1.3.0
>
>
> Underlying MR jobs succeed but I'm seeing the following in the logs (mid-size 
> distributed test cluster):
> ERROR test.IntegrationTestBigLinkedList$Verify: Found nodes which lost big or 
> tiny families, count=164
> I do not know exactly yet whether it's a bug, a test issue or env setup 
> issue, but need figure it out. Opening this to raise awareness and see if 
> someone saw that recently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to