HemaKumar created HBASE-27404: --------------------------------- Summary: Long running ExportSnapshot fails with Can't find hfile Exception. Key: HBASE-27404 URL: https://issues.apache.org/jira/browse/HBASE-27404 Project: HBase Issue Type: Bug Components: snapshots Reporter: HemaKumar
ExportSnapshot Jobs running for more than destination cluster hbase.master.hfilecleaner.ttl value, are filing with {_}Can't find hfile: <hile> in the real or archive folders{_}. Copied HFiles in archive folder is getting deleted at the Destination cluster by SnapshotHFileCleaner cleaner. # Export snapshot moves archived hfiles files to destination archved folders. # In progress ExportSnapshot manifest will be there in /hbase/.hbase-snapshot/.tmp till it is completed. # in SnapshotHFileCleaner flow, where it is ignoring /hbase/.hbase-snapshot/.tmp directory to find the snapshot reference files, {code:java} private void refreshCache() throws IOException { // just list the snapshot directory directly, do not check the modification time for the root // snapshot directory, as some file system implementations do not modify the parent directory's // modTime when there are new sub items, for example, S3. FileStatus[] snapshotDirs = FSUtils.listStatus(fs, snapshotDir, p -> !p.getName().equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)); {code} # As in progress snapshot reference is missed by SnapshotHFileCleaner. TimeToLiveHFileCleaner marks the HFiles older(coped before hbase.master.hfilecleaner.ttl) than hbase.master.hfilecleaner.ttl to delete from in progress ExportSnapshots dir. # This is causing ExportSnapshot to fail at the verification stage. Workaround: increase hbase.master.hfilecleaner.ttl value to more than the Snapshot ExportSnapshot job run time in the destination cluster. I think this issue needs to be fixed in SnapshotHFileCleaner flow so that long-running ExportSnapshot jobs can succeed. -- This message was sent by Atlassian Jira (v8.20.10#820010)