On Wed, Mar 15, 2006 at 09:53:24PM -0800, Zac Medico wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Marius Mauch wrote:
> > Marius Mauch schrieb:
> >> The first should be delayed until there is some consensus how the gpg
> >> stuff should work in the future, the others I don't see the use for.
> >> Also I only checked portage.py for changes, so emerge/repoman/... might
> >> still have to be fixed.
> >> Last but not least: I did some basic testing with this and the
> >> important stuff seems to work, but I'm quite sure the code still has a
> >> lot of bugs/issues, and this being a core functionality it needs a
> >> *lot* of testing, so I'd really appreciate if you could all give it a
> >> spin (but do not commit anything to the tree without manually checking
> >> it first).
> > 
> > Does the lack of feedback (only got a reaction from Brian so far) mean
> > that noone tried it or that it doesn't have any issues?
> 
> The patch applies and seems to work well.  At a quick glance the code looks 
> pretty clean and it's nice to migrate more code out of portage.py to a 
> separate module.  I've attached a refreshed version of the patch that applies 
> cleanly against current svn (I've made no changes).
> 
> Zac
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> 
> iD8DBQFEGP1S/ejvha5XGaMRAl/7AJ9cZbjhWtjCz+ac2/tjQNUoivj0twCg7xAG
> cYvDbMiqU5HtpNrVk7fs6RM=
> =Eqlo
> -----END PGP SIGNATURE-----

> === added file 'pym/portage_manifest.py'
> --- /dev/null 
> +++ pym/portage_manifest.py   
> @@ -0,0 +1,314 @@
> +import os, sets
> +
> +import portage, portage_exception, portage_versions, portage_const
> +from portage_checksum import *
> +from portage_exception import *
> +
> +class FileNotInManifestException(PortageException):
> +     pass
> +
> +def manifest2AuxfileFilter(filename):
> +     filename = filename.strip("/")
> +     return not (filename in ["CVS", ".svn"] or filename[:len("digest-")] == 
> "digest-")
> +
> +def manifest2MiscfileFilter(filename):
> +     filename = filename.strip("/")
> +     return not (filename in ["CVS", ".svn", "files", "Manifest"] or 
> filename[-7:] == ".ebuild")

python -m timeit -s 's="asdf"*400;s+="fdsa.ebuild"' 's.endswith(".ebuild")'
1000000 loops, best of 3: 0.88 usec per loop

python -m timeit -s 's="asdf"*400;s+="fdsa.ebuild"' 's[-7:] == ".ebuild"'
1000000 loops, best of 3: 0.564 usec per loop

Use endswith

oddly, worth noting that startswith differs in this behaviour...
python -m timeit -s 's="asdf"*400;s+="fdsa.ebuild"' 's[:7] == ".ebuild"'
1000000 loops, best of 3: 0.592 usec per loop

python -m timeit -s 's="asdf"*400;s+="fdsa.ebuild"' 's.startswith(".ebuild")'
1000000 loops, best of 3: 0.842 usec per loop

> +class Manifest(object):
> +     def __init__(self, pkgdir, db, mysettings, 
> hashes=portage_const.MANIFEST2_HASH_FUNCTIONS, manifest1_compat=True, 
> fromScratch=False):
> +             self.pkgdir = pkgdir+os.sep

rstrip os.sep prior to adding it

> +             self.fhashdict = {}
> +             self.hashes = hashes
> +             self.hashes.append("size")
> +             if manifest1_compat:
> +                     
> self.hashes.extend(portage_const.MANIFEST1_HASH_FUNCTIONS)
> +             self.hashes = sets.Set(self.hashes)
> +             for t in portage_const.MANIFEST2_IDENTIFIERS:
> +                     self.fhashdict[t] = {}
> +             self._read()
> +             self.compat = manifest1_compat
> +             self.db = db
> +             self.mysettings = mysettings
> +             if mysettings.has_key("PORTAGE_ACTUAL_DISTDIR"):
> +                     self.distdir = mysettings["PORTAGE_ACTUAL_DISTDIR"]
> +             else:
> +                     self.distdir = mysettings["DISTDIR"]

Why pass in mysettings?  
Have the code push it in, manifest shouldn't know about the DISTDIR 
key nor PORTAGE_ACTUAL_DISTDIR, should just have a directory to look 
in.


> +     def guessType(self, filename):
> +             if filename.startswith("files/digest-"):
> +                     return None
> +             if filename.startswith("files/"):

if you're intent on using os.sep, might want to correct the two '/' 
uses above to use os.path.join/os.path.sep

If concerned about cost, just calculate it once in the class namespace 
as a constant.

related, might I suggest converting away from internal strings to a 
class level enumeration?

int comparison is faster then string, plus it unbinds the internal 
code from the on disk symbols used (eg, just cause on disk is AUX 
doesn't mean internally it should be throwing around "AUX").

> +                     return "AUX"
> +             elif filename.endswith(".ebuild"):
> +                     return "EBUILD"
> +             elif filename in ["ChangeLog", "metadata.xml"]:
> +                     return "MISC"
> +             else:
> +                     return "DIST"
> +     
> +     def getFullname(self):
> +             return self.pkgdir+"Manifest"

Err... move that into the initializer.

If you're concerned folks will screw up the var, use a property to 
make it immutable.

Either way, func calls aren't cheap, and that's not really needed :)

> +     
> +     def getDigests(self):
> +             rval = {}
> +             for t in portage_const.MANIFEST2_IDENTIFIERS:
> +                     rval.update(self.fhashdict[t])
> +             return rval
> +     
> +     def _readDigests(self):
> +             mycontent = ""
> +             for d in portage.listdir(self.pkgdir+"files", filesonly=True, 
> recursive=False):
> +                     if d.startswith("digest-"):
> +                             mycontent += open(self.pkgdir+"files"+os.sep+d, 
> "r").read()
> +             return mycontent

Any reason to use portage.listdir?

It horks up the import's a bit, bloats the portage.listdir cache 
for data that shouldn't be accessed all that often.

> +             
> +     def _read(self):
> +             if not os.path.exists(self.getFullname()):
> +                     return
> +             fd = open(self.getFullname(), "r")
> +             mylines = fd.readlines()
> +             fd.close()
try:
  mylines = open(self.getFullname(), "r").readlines()
except OSError, oe:
  if oe.errno == errno.ENOENT:
     return
  raise

one less stat.

> +             mylines.extend(self._readDigests().split("\n"))
> +             for l in mylines:
> +                     myname = ""
> +                     mysplit = l.split()
> +                     if len(mysplit) == 4 and mysplit[0] in 
> portage_const.MANIFEST1_HASH_FUNCTIONS:
> +                             myname = mysplit[2]
> +                             mytype = self.guessType(myname)
> +                             if mytype == "AUX" and 
> myname.startswith("files/"):
> +                                     myname = myname[6:]
> +                             if mytype == None:
if mytype is None
rather then if mytype == None

None is singleton instance, explicit is check is faster (and never 
will bite you in the ass, same for True and False)

As stated above, personally I'd convert that over to a class level 
dict mapping external symbols to internal.

> +                                     continue
> +                             mysize = int(mysplit[3])
> +                             myhashes = {mysplit[0]: mysplit[1]}
> +                     if len(mysplit) > 4 and mysplit[0] in 
> portage_const.MANIFEST2_IDENTIFIERS:
> +                             mytype = mysplit[0]
> +                             myname = mysplit[1]
> +                             mysize = int(mysplit[2])
> +                             myhashes = dict(zip(mysplit[3::2], 
> mysplit[4::2]))
> +                     if len(myname) == 0:
> +                             continue
> +                     if not self.fhashdict[mytype].has_key(myname):
> +                             self.fhashdict[mytype][myname] = {} 
self.fhashdict[mytype].setdefault(myname, {})


> +                     self.fhashdict[mytype][myname].update(myhashes)
> +                     self.fhashdict[mytype][myname]["size"] = mysize
> +     
> +     def _writeDigests(self):
> +             cpvlist = [self.pkgdir.rstrip("/").split("/")[-2]+"/"+x[:-7] 
> for x in portage.listdir(self.pkgdir) if x.endswith(".ebuild")]
again, if using os.sep (trying to be os agnostic), nuke the "/" usage
that's kinda fugly also- just chop splice x starting at the len of the 
basedir.

> +             rval = []
> +             for cpv in cpvlist:
> +                     dname = 
> self.pkgdir+"files"+os.sep+"digest-"+portage.catsplit(cpv)[1]
os.path.join...

> +                     mylines = []
> +                     distlist = self._getCpvDistfiles(cpv)
> +                     for f in self.fhashdict["DIST"].keys():
> +                             if f in distlist:
> +                                     for h in 
> self.fhashdict["DIST"][f].keys():
> +                                             if h not in 
> portage_const.MANIFEST1_HASH_FUNCTIONS:
> +                                                     continue
> +                                             myline = " ".join([h, 
> str(self.fhashdict["DIST"][f][h]), f, str(self.fhashdict["DIST"][f]["size"])])
> +                                             mylines.append(myline)
> +                     fd = open(dname, "w")
> +                     fd.write("\n".join(mylines))
> +                     fd.write("\n")
> +                     fd.close()
rather then building your own list and then flushing it, use 
zmedico's atomic_write and write straight to the file obj; forego the 
intermediate building of the list and rely on file buffering (file 
buffering will occur anyways).

> +                     rval.append(dname)
> +             return rval
> +     
> +     def _addDigestsToManifest(self, digests, fd):
> +             mylines = []
> +             for dname in digests:
> +                     myhashes = perform_multiple_checksums(dname, 
> portage_const.MANIFEST1_HASH_FUNCTIONS+["size"])
> +                     for h in myhashes.keys():
> +                             mylines.append((" ".join([h, str(myhashes[h]), 
> os.path.join("files", os.path.basename(dname)), str(myhashes["size"])])))
> +             fd.write("\n".join(mylines))
> +             fd.write("\n")
Same thing here; just dump to the fd directly

> +     def _write(self, fd):
> +             mylines = []
> +             for t in self.fhashdict.keys():
> +                     for f in self.fhashdict[t].keys():
> +                             myline = " ".join([t, f, 
> str(self.fhashdict[t][f]["size"])])
> +                             myhashes = self.fhashdict[t][f]
> +                             for h in myhashes.keys():

drop the .keys, just do for h in myhashes.
myhashes will default to iterkeys which is faster, and less code 
involved.

> +                                     if h not in 
> portage_const.MANIFEST2_HASH_FUNCTIONS:
> +                                             continue
I'd moved that into a filter/iterfilter of myhashes in the for loop, 
but your call

> +                                     myline += " "+h+" "+str(myhashes[h])
> +                             mylines.append(myline)
> +                             if self.compat and t != "DIST":
> +                                     for h in myhashes.keys():
> +                                             if h not in 
> portage_const.MANIFEST1_HASH_FUNCTIONS:
> +                                                     continue
> +                                             mylines.append((" ".join([h, 
> str(myhashes[h]), f, str(myhashes["size"])])))
> +             fd.write("\n".join(mylines))
> +             fd.write("\n")
and here...
> +
> +     def write(self, sign=False):
> +             fd = open(self.getFullname(), "w")
> +             self._write(fd)
> +             if self.compat:
> +                     digests = self._writeDigests()
> +                     self._addDigestsToManifest(digests, fd)
> +             fd.close()
> +             if sign:
> +                     self.sign()
> +     
> +     def sign(self):
> +             raise NotImplementedError()
> +     
> +     def validateSignature(self):
> +             raise NotImplementedError()
> +     
> +     def addFile(self, ftype, fname, hashdict=None):
> +             if not os.path.exists(self.pkgdir+fname):
> +                     raise FileNotFound(fname)
> +             if not ftype in portage_const.MANIFEST2_IDENTIFIERS:
> +                     raise InvalidDataType(ftype)
> +             self.fhashdict[ftype][fname] = {}
> +             if hashdict != None:
> +                     self.fhashdict[ftype][fname].update(hashdict)
> +             if not portage_const.MANIFEST2_REQUIRED_HASH in 
> self.fhashdict[ftype][fname].keys():
> +                     self.updateFileHashes(ftype, fname)
extra stat there, would just catch the throw from updatefilehashs 
personally.

> +     
> +     def removeFile(self, ftype, fname):
> +             del self.fhashdict[ftype][fname]
> +     
> +     def hasFile(self, ftype, fname):
> +             return (fname in self.fhashdict[ftype].keys())
ick, no.  you've forced that to be linear when a return fname in 
self.fhashdict[type] can uses __contains__ which is just a hash look 
up.

> +     
> +     def findFile(self, fname):
> +             for t in portage_const.MANIFEST2_IDENTIFIERS:
> +                     if fname in self.fhashdict[t]:
> +                             return t
> +             return None
These funcs really make me think you should just access fhashdict 
directly...

> +     def create(self, checkExisting=False, assumeDistfileHashes=True):
> +             """ Recreate this Manifest from scratch, not using any existing 
> checksums
> +             (exception: if assumeDistfileHashes is true then existing DIST 
> checksums are
> +             reused if the file doesn't exist in DISTDIR."""
> +             if checkExisting:
> +                     self.checkAllHashes()
> +             if assumeDistfileHashes:
> +                     distfilehashes = self.fhashdict["DIST"]
> +             else:
> +                     distfilehashes = {}
> +             self.__init__(self.pkgdir, self.db, self.mysettings, 
> fromScratch=True)
> +             for f in portage.listdir(self.pkgdir, filesonly=True, 
> recursive=False):
> +                     if f.endswith(".ebuild"):
> +                             mytype = "EBUILD"
> +                     elif manifest2MiscfileFilter(f):
> +                             mytype = "MISC"
> +                     else:
> +                             continue
> +                     self.fhashdict[mytype][f] = 
> perform_multiple_checksums(self.pkgdir+f, self.hashes)
> +             for f in portage.listdir(self.pkgdir+"files", filesonly=True, 
> recursive=True):
> +                     if not manifest2AuxfileFilter(f):
> +                             continue
> +                     self.fhashdict["AUX"][f] = 
> perform_multiple_checksums(self.pkgdir+"files"+os.sep+f, self.hashes)
> +             cpvlist = [self.pkgdir.rstrip("/").split("/")[-2]+"/"+x[:-7] 
> for x in portage.listdir(self.pkgdir) if x.endswith(".ebuild")]
> +             distlist = []
> +             for cpv in cpvlist:
> +                     distlist.extend(self._getCpvDistfiles(cpv))
> +             for f in distlist:
> +                     fname = self.distdir+os.sep+f
> +                     if os.path.exists(fname):
> +                             self.fhashdict["DIST"][f] = 
> perform_multiple_checksums(fname, self.hashes)
> +                     elif assumeDistfileHashes and f in 
> distfilehashes.keys():
> +                             self.fhashdict["DIST"][f] = distfilehashes[f]
> +                     else:
> +                             raise FileNotFound(fname)                       
> +     
> +     def _getAbsname(self, ftype, fname):
> +             if ftype == "DIST":
> +                     absname = self.distdir+os.sep+fname
os.path.join...

> +             elif ftype == "AUX":
> +                     absname = os.sep.join([self.pkgdir, "files", fname])
os.path.join.  that one is just fugly btw ;)

> +             else:
> +                     absname = self.pkgdir+os.sep+fname
same...

> +             return absname  
> +     
> +     def checkAllHashes(self, ignoreMissingFiles=False):
> +             for t in portage_const.MANIFEST2_IDENTIFIERS:
> +                     self.checkTypeHashes(t, 
> ignoreMissingFiles=ignoreMissingFiles)
> +     
> +     def checkTypeHashes(self, idtype, ignoreMissingFiles=False):
> +             for f in self.fhashdict[idtype].keys():
> +                     self.checkFileHashes(idtype, f, 
> ignoreMissing=ignoreMissingFiles)
> +     
> +     def checkFileHashes(self, ftype, fname, ignoreMissing=False):
> +             myhashes = self.fhashdict[ftype][fname]
> +             ok,reason = verify_all(self._getAbsname(ftype, fname), 
> self.fhashdict[ftype][fname])
> +             if not ok:
> +                     raise DigestException(tuple([self._getAbsname(ftype, 
> fname)]+list(reason)))
> +             return ok, reason
> +
> +     def checkCpvHashes(self, cpv, checkDistfiles=True, onlyDistfiles=False, 
> checkMiscfiles=False):
> +             """ check the hashes for all files associated to the given cpv, 
> include all
> +             AUX files and optionally all MISC files. """
> +             if not onlyDistfiles:
> +                     self.checkTypeHashes("AUX", ignoreMissingFiles=False)
> +                     if checkMiscfiles:
> +                             self.checkTypeHashes("MISC", 
> ignoreMissingFiles=False)
> +                     ebuildname = portage.catsplit(cpv)[1]+".ebuild"
> +                     self.checkFileHashes("EBUILD", ebuildname, 
> ignoreMissing=False)
> +             if checkDistfiles:
> +                     if onlyDistfiles:
> +                             for f in self._getCpvDistfiles(cpv):
> +                                     self.checkFileHashes("DIST", f, 
> ignoreMissing=False)
> +     
Personally... these funcs seem wrong to me.

Manifest (to me) strikes me as just a data store; 
verification/generation of chksums should be external...

your call however.

> +     def _getCpvDistfiles(self, cpv):
> +             """ Get a list of all DIST files associated to the given cpv """
> +             return self.db.getfetchlist(cpv, mysettings=self.mysettings, 
> all=True)[1]

Manifest code knowing about dbapi instances seems really rather icky 
to me...

~harring

Attachment: pgpn6ETeklH3k.pgp
Description: PGP signature

Reply via email to