I might be wrong, but I don't think the HFileLink is used in the case
that you're asking about.
I think for your situation (file no longer exists in data/, but is now
in archive/), HBase knows how to find this file purely by knowing
tableName, colfam, region, and file name and checking in both locations.
In other words, if you crack open the snapshot manifest, the data
contained in there is sufficient for the code to find that file in HDFS.
You could take a look at the CleanerChore(s) or the Space Quota work as
they both do some work around correlating files referenced by a snapshot :)
The HFileLink is a little hack to (primarily) avoid proactively
rewriting files for a snapshot restore. We can write that very small
"symlink" instead of copying the entire file out of archive/ and back
into data/
(and, to be clear: no, the manifest is never rewritten -- it's immutable)
On 5/1/18 7:49 PM, Zach York wrote:
Ah, of course. I missed that HFileLink handled the archive directory too.
Thanks Ted!
On Tue, May 1, 2018 at 4:35 PM, Ted Yu <yuzhih...@gmail.com> wrote:
Please take a look at HFileLink#buildFromHFileLinkPattern where archive
directory is taken into account.
Cheers
On Tue, May 1, 2018 at 4:23 PM, Zach York <zyork.contribut...@gmail.com>
wrote:
Hello,
This should be a fairly simple question, but I have been diving into docs
and code and haven't found anything obvious.
I have been looking at snapshot code, but have been unable to
understand/find one part. When a file that was present in a snapshot is
compacted and moved to the archive directory, how does the snapshot know
where it is? Does a compaction update the data.manifest file?
Thanks,
Zach