[
https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matteo Bertozzi updated HBASE-7643:
-----------------------------------
Attachment: HBASE-7653-p4-v1.patch
p4-v1 removes the delete file on failure.
This avoid the possible data loss in case of rename not able to succeed.
e.g. after compacting one of the file is not able to get archived. the file
remain in the /hbase/table/region/family/ folder until next compaction. Worst
case, the region has to perform an extra search in a file already compacted.
> HFileArchiver.resolveAndArchive() race condition and snapshot data loss
> -----------------------------------------------------------------------
>
> Key: HBASE-7643
> URL: https://issues.apache.org/jira/browse/HBASE-7643
> Project: HBase
> Issue Type: Bug
> Affects Versions: hbase-6055, 0.96.0
> Reporter: Matteo Bertozzi
> Assignee: Matteo Bertozzi
> Priority: Blocker
> Fix For: 0.96.0, 0.94.5
>
> Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch
>
>
> * The master have an hfile cleaner thread (that is responsible for cleaning
> the /hbase/.archive dir)
> ** /hbase/.archive/table/region/family/hfile
> ** if the family/region/family directory is empty the cleaner removes it
> * The master can archive files (from another thread, e.g. DeleteTableHandler)
> * The region can archive files (from another server/process, e.g. compaction)
> The simplified file archiving code looks like this:
> {code}
> HFileArchiver.resolveAndArchive(...) {
> // ensure that the archive dir exists
> fs.mkdir(archiveDir);
> // move the file to the archiver
> success = fs.rename(originalPath/fileName, archiveDir/fileName)
> // if the rename is failed, delete the file without archiving
> if (!success) fs.delete(originalPath/fileName);
> }
> {code}
> Since there's no synchronization between HFileArchiver.resolveAndArchive()
> and the cleaner run (different process, thread, ...) you can end up in the
> situation where you are moving something in a directory that doesn't exists.
> {code}
> fs.mkdir(archiveDir);
> // HFileCleaner chore starts at this point
> // and the archiveDirectory that we just ensured to be present gets removed.
> // The rename at this point will fail since the parent directory is missing.
> success = fs.rename(originalPath/fileName, archiveDir/fileName)
> {code}
> The bad thing of deleting the file without archiving is that if you've a
> snapshot that relies on the file to be present, or you've a clone table that
> relies on that file is that you're losing data.
> Possible solutions
> * Create a ZooKeeper lock, to notify the master ("Hey I'm archiving
> something, wait a bit")
> * Add a RS -> Master call to let the master removes files and avoid this
> kind of situations
> * Avoid to remove empty directories from the archive if the table exists or
> is not disabled
> * Add a try catch around the fs.rename
> The last one, the easiest one, looks like:
> {code}
> for (int i = 0; i < retries; ++i) {
> // ensure archive directory to be present
> fs.mkdir(archiveDir);
> // ----> possible race <-----
> // try to archive file
> success = fs.rename(originalPath/fileName, archiveDir/fileName);
> if (success) break;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira