[
https://issues.apache.org/jira/browse/HBASE-16232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15386652#comment-15386652
]
Mikhail Antonov commented on HBASE-16232:
-----------------------------------------
So one observation I have so far, maybe completely irrelevant.
During the time of generator run which led to lost keys I'm seeing this in the
logs, looking at one region, and I didn't see those on other runs:
{code}
WARN backup.HFileArchiver: Failed to archive class
org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile,
file:<cluster>/data/default/IntegrationTestBigLinkedList.5/ee81b123a9d48bf086d1a237c35c563b/tiny/81580c0a6d8f41cdb83a1e9abf6491df
on try #0
java.io.FileNotFoundException: File/Directory
<cluster>/data/default/IntegrationTestBigLinkedList.5/ee81b123a9d48bf086d1a237c35c563b/tiny/81580c0a6d8f41cdb83a1e9abf6491df
does not exist.
at
org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:121)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1907)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1222)
{code}
I'm then getting :
{code}
WARN backup.HFileArchiver: Failed to complete archive of:
[class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile,
<cluster>/data/default/IntegrationTestBigLinkedList.5/ee81b123a9d48bf086d1a237c35c563b/tiny/32029c46f36e481b99a7af10900e533a,
<8 more files here>]
Those files are still in the original location, and they may slow down reads.
{code}
and
{code}
FATAL regionserver.HRegionServer: ABORTING region server <server>:
Unrecoverable exception while closing region
IntegrationTestBigLinkedList.5,$rA\xC0gzW\x14,1468971262948.ee81b123a9d48bf086d1a237c35c563b.,
still finishing close
java.io.IOException: java.io.IOException: Failed to archive/delete all the
files for
region:IntegrationTestBigLinkedList.5,$rA?gzW,1468971262948.ee81b123a9d48bf086d1a237c35c563b.,
family:tiny into
<cluster>/archive/data/default/IntegrationTestBigLinkedList.5/ee81b123a9d48bf086d1a237c35c563b/tiny.
Something is probably awry on the filesystem.
at
org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1516)
at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1378)
at
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:138)
at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failed to archive/delete all the files for
region:IntegrationTestBigLinkedList.5,$rA?gzW,1468971262948.ee81b123a9d48bf086d1a237c35c563b.,
family:tiny into
<cluster>/archive/data/default/IntegrationTestBigLinkedList.5/ee81b123a9d48bf086d1a237c35c563b/tiny.
Something is probably awry on the filesystem.
at
org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFiles(HFileArchiver.java:233)
at
org.apache.hadoop.hbase.regionserver.HRegionFileSystem.removeStoreFiles(HRegionFileSystem.java:424)
at
org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2699)
at org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:835)
at org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:116)
at
org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1498)
at
org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1494)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
{code}
and then
ERROR executor.EventHandler: Caught throwable while processing event
M_RS_CLOSE_REGION
> ITBLL fails on branch-1.3, now loosing actual keys
> --------------------------------------------------
>
> Key: HBASE-16232
> URL: https://issues.apache.org/jira/browse/HBASE-16232
> Project: HBase
> Issue Type: Bug
> Components: dataloss, integration tests
> Affects Versions: 1.3.0
> Reporter: Mikhail Antonov
> Assignee: Mikhail Antonov
> Priority: Blocker
>
> So I'm running ITBLL off branch-1.3 on recent commit (after [~stack]'s fix
> for fake keys showing up in the scans) with increased number of regions per
> regionserver and seeing the following.
> {quote}
> $Verify$Counts
> REFERENCED 0 4,999,999,994 4,999,999,994
> UNDEFINED 0 3 3
> UNREFERENCED 0 3 3
> {quote}
> So we're loosing some keys. This time those aren't fake:
> {quote}
> undef
> \x89\x10\xE0\xBBx\xF1\xC4\xBAY`\xC4\xD77\x87\x84\x0F 0 1 1
> \x89\x11\x0F\xBA@\x0D8^\xAE \xB1\xCAh\xEB&\xE3 0 1 1
> \x89\x16waxv;\xB1\xE3Z\xE6"|\xFC\xBE\x9A 0 1 1
> unref
> \x15\x1F*f\x92i6\x86\x1D\x8E\xB7\xE1\xC1=\x96\xEF 0 1 1
> \xF4G\xC6E\xD6\xF1\xAB\xB7\xDB\xC0\x94\xF2\xE7mN\xEC 0 1 1
> U\x0F'\x88\x106\x19\x1C\x87Y"\xF3\xE6\xC1\xC8\x15
> {quote}
> Re-running verify step with CM off still shows this issue. Search tool
> reports:
> {quote}
> Total
> \x89\x11\x0F\xBA@\x0D8^\xAE \xB1\xCAh\xEB&\xE3 5 0 5
> \x89\x16waxv;\xB1\xE3Z\xE6"|\xFC\xBE\x9A 4 0 4
> CELL_WITH_MISSING_ROW 15 0 15
> {quote}
> Will post more as I dig into.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)