[ 
https://issues.apache.org/jira/browse/HBASE-16232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15386652#comment-15386652
 ] 

Mikhail Antonov commented on HBASE-16232:
-----------------------------------------

So one observation I have so far, maybe completely irrelevant.

During the time of generator run which led to lost keys I'm seeing this in the 
logs, looking at one region, and I didn't see those on other runs:

{code}
WARN backup.HFileArchiver: Failed to archive class 
org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
file:<cluster>/data/default/IntegrationTestBigLinkedList.5/ee81b123a9d48bf086d1a237c35c563b/tiny/81580c0a6d8f41cdb83a1e9abf6491df
 on try #0
java.io.FileNotFoundException: File/Directory 
<cluster>/data/default/IntegrationTestBigLinkedList.5/ee81b123a9d48bf086d1a237c35c563b/tiny/81580c0a6d8f41cdb83a1e9abf6491df
 does not exist.
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1907)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1222)
{code}

I'm then getting :

{code}
WARN backup.HFileArchiver: Failed to complete archive of: 
[class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
<cluster>/data/default/IntegrationTestBigLinkedList.5/ee81b123a9d48bf086d1a237c35c563b/tiny/32029c46f36e481b99a7af10900e533a,
 
<8 more files here>]
 Those files are still in the original location, and they may slow down reads.
{code}

and

{code}
FATAL regionserver.HRegionServer: ABORTING region server <server>: 
Unrecoverable exception while closing region 
IntegrationTestBigLinkedList.5,$rA\xC0gzW\x14,1468971262948.ee81b123a9d48bf086d1a237c35c563b.,
 still finishing close
java.io.IOException: java.io.IOException: Failed to archive/delete all the 
files for 
region:IntegrationTestBigLinkedList.5,$rA?gzW,1468971262948.ee81b123a9d48bf086d1a237c35c563b.,
 family:tiny into 
<cluster>/archive/data/default/IntegrationTestBigLinkedList.5/ee81b123a9d48bf086d1a237c35c563b/tiny.
 Something is probably awry on the filesystem.
        at 
org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1516)
        at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1378)
        at 
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:138)
        at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failed to archive/delete all the files for 
region:IntegrationTestBigLinkedList.5,$rA?gzW,1468971262948.ee81b123a9d48bf086d1a237c35c563b.,
 family:tiny into 
<cluster>/archive/data/default/IntegrationTestBigLinkedList.5/ee81b123a9d48bf086d1a237c35c563b/tiny.
 Something is probably awry on the filesystem.
        at 
org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:233)
        at 
org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:424)
        at 
org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2699)
        at org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:835)
        at org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:116)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1498)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1494)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        ... 3 more
{code}

and then

ERROR executor.EventHandler: Caught throwable while processing event 
M_RS_CLOSE_REGION

> ITBLL fails on branch-1.3, now loosing actual keys
> --------------------------------------------------
>
>                 Key: HBASE-16232
>                 URL: https://issues.apache.org/jira/browse/HBASE-16232
>             Project: HBase
>          Issue Type: Bug
>          Components: dataloss, integration tests
>    Affects Versions: 1.3.0
>            Reporter: Mikhail Antonov
>            Assignee: Mikhail Antonov
>            Priority: Blocker
>
> So I'm running ITBLL off branch-1.3 on recent commit (after [~stack]'s fix 
> for fake keys showing up in the scans) with increased number of regions per 
> regionserver and seeing the following.
> {quote} 
> $Verify​$Counts       
> REFERENCED    0       4,999,999,994   4,999,999,994
> UNDEFINED     0       3       3
> UNREFERENCED  0       3       3
> {quote}
> So we're loosing some keys. This time those aren't fake:
> {quote}
> undef 
> \x89\x10\xE0\xBBx\xF1\xC4\xBAY`\xC4\xD77\x87\x84\x0F  0       1       1
> \x89\x11\x0F\xBA@\x0D8^\xAE \xB1\xCAh\xEB&\xE3        0       1       1
> \x89\x16waxv;\xB1\xE3Z\xE6"|\xFC\xBE\x9A      0       1       1
> unref 
> \x15\x1F*f\x92i6\x86\x1D\x8E\xB7\xE1\xC1=\x96\xEF     0       1       1
> \xF4G\xC6E\xD6\xF1\xAB\xB7\xDB\xC0\x94\xF2\xE7mN\xEC  0       1       1
> U\x0F'\x88\x106\x19\x1C\x87Y"\xF3\xE6\xC1\xC8\x15
> {quote}
> Re-running verify step with CM off still shows this issue. Search tool 
> reports:
> {quote}
> Total
> \x89\x11\x0F\xBA@\x0D8^\xAE \xB1\xCAh\xEB&\xE3        5       0       5
> \x89\x16waxv;\xB1\xE3Z\xE6"|\xFC\xBE\x9A      4       0       4
> CELL_WITH_MISSING_ROW 15      0       15
> {quote}
> Will post more as I dig into.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to