On 2026/1/19 15:29, Christoph Hellwig wrote:
On Sat, Jan 17, 2026 at 12:21:16AM +0800, Gao Xiang wrote:
Hi Christoph,
On 2026/1/16 23:46, Christoph Hellwig wrote:
I don't really understand the fingerprint idea. Files with the
same content will point to the same physical disk blocks, so that
should be a much better indicator than a finger print? Also how does
Page cache sharing should apply to different EROFS
filesystem images on the same machine too, so the
physical disk block number idea cannot be applied
to this.
Oh. That's kinda unexpected and adds another twist to the whole scheme.
So in that case the on-disk data actually is duplicated in each image
and then de-duplicated in memory only? Ewwww...
On-disk deduplication is decoupled from this feature:
- EROFS can share the same blocks in blobs (multiple
devices) among different images, so that on-disk data
can be shared by refering the same blobs;
- On-disk data won't be deduplicated in image if reflink
is enabled for backing fses, userspace mounters can
trigger background GCs to deduplicate the identical
blocks.
I just tried to say EROFS doesn't limit what's
the real meaning of `fingerprint` (they can be serialized
integer numbers for example defined by a specific image
publisher, or a specific secure hash. Currently,
"mkfs.erofs" will generate sha256 for each files), but
left them to the image builders:
1) if `fingerprint` is distributed as on-disk part of
signed images, as I said, it could be shared within a
trusted domain_id (usually the same image builder) --
that is the top priority thing using dmverity;
Or
2) If `fingerprint` is not distributed in the image
or images are untrusted (e.g. unknown signatures),
image fetchers can scan each inode in the golden
images to generate an extra minimal EROFS
metadata-only image with local calculated
`fingerprint` too, which is much similar to the
current ostree way (parse remote files and calculate
digests).
Thanks,
Gao Xiang