HemaKumar created HBASE-27404:
---------------------------------

             Summary: Long running ExportSnapshot fails with Can't find hfile 
Exception.
                 Key: HBASE-27404
                 URL: https://issues.apache.org/jira/browse/HBASE-27404
             Project: HBase
          Issue Type: Bug
          Components: snapshots
            Reporter: HemaKumar


ExportSnapshot Jobs running for more than destination cluster 
hbase.master.hfilecleaner.ttl value, are filing with {_}Can't find hfile: 
<hile> in the real or archive folders{_}. Copied HFiles in archive folder is 
getting deleted at the Destination cluster by SnapshotHFileCleaner cleaner.

 
 # Export snapshot moves archived hfiles files to destination archved folders.
 # In progress ExportSnapshot manifest will be there in 
/hbase/.hbase-snapshot/.tmp till it is completed.
 # in SnapshotHFileCleaner flow, where it is ignoring 
/hbase/.hbase-snapshot/.tmp directory to find the snapshot reference files,

{code:java}
 

private void refreshCache() throws IOException {
  // just list the snapshot directory directly, do not check the modification 
time for the root
  // snapshot directory, as some file system implementations do not modify the 
parent directory's
  // modTime when there are new sub items, for example, S3.
  FileStatus[] snapshotDirs = FSUtils.listStatus(fs, snapshotDir,
    p -> !p.getName().equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)); 
{code}
 # As in progress snapshot reference is missed by SnapshotHFileCleaner. 
TimeToLiveHFileCleaner marks the HFiles older(coped before 
hbase.master.hfilecleaner.ttl) than hbase.master.hfilecleaner.ttl to delete 
from in progress ExportSnapshots dir.
 # This is causing ExportSnapshot to fail at the verification stage.

 

Workaround:

increase hbase.master.hfilecleaner.ttl value to more than the Snapshot 
ExportSnapshot job run time in the destination cluster.

 

I think this issue needs to be fixed in SnapshotHFileCleaner flow so that 
long-running ExportSnapshot jobs can succeed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to