Re: [gentoo-dev] [RFC] DIGESTS metadata variable for cache validation

Brian Harring Tue, 10 Feb 2009 04:21:34 -0800

On Mon, Feb 09, 2009 at 11:55:41AM -0800, Zac Medico wrote:
> All that I can say right now is that I recall questions about it in
> the past from overlay maintainers (I don't have a list) and the
> funtoo project is the only one which I can name offhand.
> 
> However, the ability to distribute cache via a vcs is only an
> ancillary feature which is made possible by the DIGESTS data. The
> DIGESTS data is useful regardless of the protocol that is used to
> distribute the cache, since it allows the cache to be properly
> validated for integrity. So, the real primary reason for introducing
> the DIGESTS data is to provide a proper solution for cases like bug
> #139134 [1] in which invalid metadata cache goes undetected.


I'm sorry, but this proposal smells something awful.  Because of the 
mtime requirement on cache entries you're proposing jamming another 
1.4MB into the cache for validation purposes (which should be 4x that 
since a full checksum really should be in there) while trying to 
maintain compatibility.

Frankly, forget compatibility- the current format could stand to die.  
The repository format is an ever growing mess- leave it as is and 
work on cutting over to something sane.

Overlay maintainers who want the latest/greatest obviously can convert 
over also; one would hope their would be enough cleanup to make it 
worth their time.

As for the nasty gentoo-x86 compatibility, basically, do the 
following:

1) maintain the existing cvs repo as is
2) iron out what cleanup/restructuring is desired.  glep55 being 
jammed in here is a potential for example.  Nail down the new repo 
format basically (with an eye for translating the cvs repo to it on 
the fly).
3) use an eclass index holding the checksums, w/ the cache entries 
referencing the index numbers rather (sorting the index by 
consumption, meaning the more ebuilds using it the lower the index): 
this brings the cache addition down to around 285KB (acceptable imo) 
while giving full flexibility in the checksums available for eclasses.  
This is assuming the current flat_list format is still in use in the 
new repo...
4) drop mtime on cache entries, bump it forward whenever it's updated 
(bug 139134 goes away) jamming in an ebuild checksum of some sort.
5) rsync nodes are required to have 10GB of storage available- so 
storage shouldn't be an issue, but ensuring all nodes have been 
updated to sync both the old and *new* format is required.
6) suffer through cvs for a year (or whatever time frame), converting 
folks over to the new url.
7) kill the old format after whatever period deemed best (potentially 
leaving a README telling folks how to update if they're seriously 
behind).
8) convert the cvs repo to the new format, tear down the 
transformation bits.

Yes, the plan above is coarse- there aren't any glaring holes as far 
as I can see however.  It does place restrictions on the repo format 
choosen, but careful choices in the new format (heavy format 
versioning) should make it possible to make this sort of issue less 
of a pain down the line.


At the very least, doing a different repo format for repos/overlays 
stored in a vcs that doesn't track mtime would solve their issues- it 
also has the nice benefit of not making the repo more bloated for the 
99% of folk who didn't even hit the issues spawning this.

If gentoo-x86 is left as is, bug 139134 can be head off w/out jamming 
a new metadata key in; to be clear, I'm likely going to "Special Hell" 
for suggesting this but if mtime/size on the new cache entry is the 
same size as old, append a space to the value in the description 
field.

All sane managers ought to be doing basic clean up of that value 
anyways in their data layer (let alone at the UI level), but it's 
enough to make rsync behave.

So... flame away.

~brian

pgpWzHwIYn9If.pgp
Description: PGP signature

Re: [gentoo-dev] [RFC] DIGESTS metadata variable for cache validation

Reply via email to