[ 
https://issues.apache.org/jira/browse/HBASE-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14491287#comment-14491287
 ] 

Hadoop QA commented on HBASE-13430:
-----------------------------------

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12724787/HBASE-13430-master-v2.patch
  against master branch at commit e75c6201c69e57416525135a397a971ad4d1b902.
  ATTACHMENT ID: 12724787

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

    {color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.1 2.5.2 2.6.0)

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

    {color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

    {color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13675//testReport/
Release Findbugs (version 2.0.3)        warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13675//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13675//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13675//console

This message is automatically generated.

> HFiles that are in use by a table cloned from a snapshot may be deleted when 
> that snapshot is deleted
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-13430
>                 URL: https://issues.apache.org/jira/browse/HBASE-13430
>             Project: HBase
>          Issue Type: Bug
>          Components: hbase
>            Reporter: Tobi Vollebregt
>            Priority: Critical
>              Labels: data-integrity, master
>             Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2
>
>         Attachments: HBASE-13430-master-v1.patch, 
> HBASE-13430-master-v2.patch, hbase-13430-attempted-fix.patch, 
> hbase-13430-test.patch
>
>
> We recently had a production issue in which HFiles that were still in use by 
> a table were deleted. This appears to have been caused by race conditions in 
> the order in which HFileLinks are created, combined with the fact that only 
> files younger than {{hbase.master.hfilecleaner.ttl}} are kept alive.
> This is how to reproduce:
>  * Clone a large snapshot into a new table. The clone operation must take 
> more than {{hbase.master.hfilecleaner.ttl}} time to guarantee data loss.
>  * Ensure that no other table or snapshot is referencing the HFiles used by 
> the new table.
>  * Delete the snapshot. This breaks the table.
> The main cause is this:
>  * Cloning a snapshot creates the table in the {{HBASE_TEMP_DIRECTORY}}.
>  * However, it immediately creates back references to the HFileLinks that it 
> creates for the table in the archive directory.
>  * HFileLinkCleaner does not check the {{HBASE_TEMP_DIRECTORY}}, so it 
> considers all those back references deletable.
>  * The only thing that keeps them alive is the TimeToLiveHFileCleaner, but 
> only for 5 minutes.
>  * So if cloning the snapshot takes more than 5 minutes, and the HFiles 
> aren't referenced by anything else, data loss is guaranteed.
> I have a unit test reproducing the issue and I tried to fix this, but didn't 
> completely succeed. I will attach the patch shortly.
> Workarounds:
>  * Don't delete any snapshots that you cloned into a table (we used this 
> successfully-- we actually restored the deleted snapshot from backup using 
> ExportSnapshot after the data loss happened, which successfully reversed the 
> data loss).
>  * Manually check the back references and create any missing ones after 
> cloning a snapshot.
>  * Increase {{hbase.master.hfilecleaner.ttl}}. (untested)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to