I might be wrong, but I don't think the HFileLink is used in the case that you're asking about.

I think for your situation (file no longer exists in data/, but is now in archive/), HBase knows how to find this file purely by knowing tableName, colfam, region, and file name and checking in both locations. In other words, if you crack open the snapshot manifest, the data contained in there is sufficient for the code to find that file in HDFS. You could take a look at the CleanerChore(s) or the Space Quota work as they both do some work around correlating files referenced by a snapshot :)

The HFileLink is a little hack to (primarily) avoid proactively rewriting files for a snapshot restore. We can write that very small "symlink" instead of copying the entire file out of archive/ and back into data/

(and, to be clear: no, the manifest is never rewritten -- it's immutable)

On 5/1/18 7:49 PM, Zach York wrote:
Ah, of course. I missed that HFileLink handled the archive directory too.

Thanks Ted!

On Tue, May 1, 2018 at 4:35 PM, Ted Yu <yuzhih...@gmail.com> wrote:

Please take a look at HFileLink#buildFromHFileLinkPattern where archive
directory is taken into account.



Cheers

On Tue, May 1, 2018 at 4:23 PM, Zach York <zyork.contribut...@gmail.com>
wrote:

Hello,

This should be a fairly simple question, but I have been diving into docs
and code and haven't found anything obvious.

I have been looking at snapshot code, but have been unable to
understand/find one part. When a file that was present in a snapshot is
compacted and moved to the archive directory, how does the snapshot know
where it is? Does a compaction update the data.manifest file?

Thanks,
Zach



Reply via email to