[
https://issues.apache.org/jira/browse/HBASE-16074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342438#comment-15342438
]
Elliott Clark commented on HBASE-16074:
---------------------------------------
So we had a run like this:
{code}
REFERENCED 0 1,800,000,000 1,800,000,000
UNREFERENCED 0 76 76
{code}
That is the correct number of referenced but there shouldn't be any
unreferenced. So we went into the logs and found:
{code}
2016-06-21 04:28:43,314 WARN [main]
org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify: Prev is not
set for: Y\xD3\x16t\xC5\x9D1@
{code}
That row key looks really weird. It's less than the length we would expect.
However it is the split point for a region:
{code}
IntegrationTestBigLinkedList.11,Y\xD3\x16t\xC5\x9D1@,1466506812220.15898a252e1b54728dd44a2b13fca290.
{code}
Going into the shell and that row does not exist.
{code}
get ''HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.3.0-fb10-SNAPSHOT, rd8d63d67152af8eed48f8863a0e13d3e71fc097c, Fri Jun
10 16:59:00 PDT 2016
hbase(main):001:0> get 'IntegrationTestBigLinkedList.11', "Y\xD3\x16t\xC5\x9D1@"
COLUMN
CELL
0 row(s) in 0.3390 seconds
{code}
So that got us very worried about data loss. So we re-ran the verify step. When
stopping the chaos monkey and letting everything settle we got a clean verify
step.
{code}
REFERENCED 0 1,800,000,000 1,800,000,000
{code}
> ITBLL fails, reports lost big or tine families
> ----------------------------------------------
>
> Key: HBASE-16074
> URL: https://issues.apache.org/jira/browse/HBASE-16074
> Project: HBase
> Issue Type: Bug
> Components: integration tests
> Affects Versions: 1.3.0
> Reporter: Mikhail Antonov
> Assignee: Mikhail Antonov
> Priority: Blocker
> Fix For: 1.3.0
>
>
> Underlying MR jobs succeed but I'm seeing the following in the logs (mid-size
> distributed test cluster):
> ERROR test.IntegrationTestBigLinkedList$Verify: Found nodes which lost big or
> tiny families, count=164
> I do not know exactly yet whether it's a bug, a test issue or env setup
> issue, but need figure it out. Opening this to raise awareness and see if
> someone saw that recently.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)