[
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673171#comment-16673171
]
Ted Yu commented on HBASE-21387:
--------------------------------
>From https://builds.apache.org/job/PreCommit-HBASE-Build/14932/console :
{code}
00:38:23 +1 overall
00:38:23
00:38:23 | Vote | Subsystem | Runtime | Comment
00:38:23
============================================================================
00:38:23 | 0 | reexec | 0m 11s | Docker mode activated.
00:38:23 | 0 | patch | 0m 2s | The patch file was not named
according
00:38:23 | | | | to hbase's naming conventions.
Please
00:38:23 | | | | see
00:38:23 | | | |
https://yetus.apache.org/documentation/0.
00:38:23 | | | | 8.0/precommit-patchnames for
00:38:23 | | | | instructions.
00:38:23 | | | | Prechecks
00:38:23 | +1 | hbaseanti | 0m 0s | Patch does not have any
anti-patterns.
00:38:23 | +1 | @author | 0m 0s | The patch does not contain any
@author
00:38:23 | | | | tags.
00:38:23 | -0 | test4tests | 0m 0s | The patch doesn't appear to
include any
00:38:23 | | | | new or modified tests. Please
justify
00:38:23 | | | | why no new tests are needed
for this
00:38:23 | | | | patch. Also please list what
manual
00:38:23 | | | | steps were performed to verify
this
00:38:23 | | | | patch.
00:38:23 | | | | master Compile Tests
00:38:23 | +1 | mvninstall | 4m 49s | master passed
00:38:23 | +1 | compile | 1m 46s | master passed
00:38:23 | +1 | checkstyle | 1m 7s | master passed
00:38:23 | +1 | shadedjars | 4m 2s | branch has no errors when
building our
00:38:23 | | | | shaded downstream artifacts.
00:38:23 | +1 | findbugs | 2m 1s | master passed
00:38:23 | +1 | javadoc | 0m 30s | master passed
00:38:23 | | | | Patch Compile Tests
00:38:23 | +1 | mvninstall | 4m 45s | the patch passed
00:38:23 | +1 | compile | 1m 50s | the patch passed
00:38:23 | +1 | javac | 1m 50s | the patch passed
00:38:23 | +1 | checkstyle | 1m 4s | the patch passed
00:38:23 | +1 | whitespace | 0m 0s | The patch has no whitespace
issues.
00:38:23 | +1 | shadedjars | 4m 6s | patch has no errors when
building our
00:38:23 | | | | shaded downstream artifacts.
00:38:24 | +1 | hadoopcheck | 9m 53s | Patch does not cause any
errors with
00:38:24 | | | | Hadoop 2.7.4 or 3.0.0.
00:38:24 | +1 | findbugs | 2m 11s | the patch passed
00:38:24 | +1 | javadoc | 0m 29s | the patch passed
00:38:24 | | | | Other Tests
00:38:24 | +1 | unit | 128m 21s | hbase-server in the patch
passed.
00:38:24 | +1 | asflicense | 0m 25s | The patch does not generate
ASF License
00:38:24 | | | | warnings.
00:38:24 | | | 168m 0s |
00:38:24
00:38:24
00:38:24 || Subsystem || Report/Notes ||
00:38:24
============================================================================
00:38:24 | Docker | Client=17.05.0-ce Server=17.05.0-ce
Image:yetus/hbase:b002b0b |
00:38:24 | JIRA Issue | HBASE-21387 |
00:38:24 | JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12946617/21387.v3.txt |
{code}
> Race condition surrounding in progress snapshot handling in snapshot cache
> leads to loss of snapshot files
> ----------------------------------------------------------------------------------------------------------
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
> Issue Type: Bug
> Reporter: Ted Yu
> Assignee: Ted Yu
> Priority: Major
> Labels: snapshot
> Attachments: 21387.v1.txt, 21387.v2.txt, 21387.v3.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2]
> snapshot.SnapshotReferenceUtil: Can't find hfile:
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
> or archive
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
> directory for the primary table.
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427]
> cleaner.HFileCleaner: Removing:
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
> if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> which only excludes the temp dir, but not in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, SnapshotDirectoryInfo
> for the in progress snapshot doesn't include all store file (leaving some
> hole in cache).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that
> lastModifiedTime is up to date. So cleaner proceeds to check in progress
> snapshot(s). However, the snapshot has completed by that time, resulting in
> some file(s) deemed unreferenced.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)