On 2011-09-13 18:04, Niels Thykier wrote: > Package: lintian > Severity: important > > > Jakub realized the source of a lot of our errors on lintian.d.o are > caused by limitations in the file-system. We should probably use > a pool or something similar to reduce the amount of elements in > each dirs. > > ~Niels > > >
I guess it might be a good time for a little status update here. The lab-refactor branch is now working for "simple" use cases[1]. However, the lintian.d.o-style usage needs some attention. In the master branch we use $lab/info/* as a list of "what was in the mirror last time we checked". Those files have been repurposed in the lab-refactor branch, where their new meaning is "what is currently in the lab". This means that "dist" search[2] is currently broken. To my knowledge there are *2* known cases where "dist" searches make sense - lintian.d.o and lintian.debathena.o. I feel we should move that functionality to a new frontend (such as the "lintian-harness"[3]) that would focus lintian.d.o-like setups. Note that "repurposing" is not entirely complete and therefore reporting/harness is more or less broken right now. One of the issues is that unpack/* still use the files in info/* as a dist list and not a lab list. I also considered adding a file in info/ to keep track of lab-wide (meta)data, such as the lab-format. In the old lab format, this is stored in every entry. This makes is slightly more difficult to check if we are dealing with a compatible lab. Consider if you use an "old" lintian to use the new lab style - they do not store the entries the same place, so it has no reliable way to detect it is not compatible. I would prefer that an old lintian would always be able to say "The lab uses a newer lab-format that this version of lintian supports" - even if this case will "probably never happen". I am also wondering what we need in the "per-entry" lintian-status file. In the master branch, we store Lintian-Version, Lab-Format, Package (name), Version (package), Type (package) and Timestamp. When we read the status file, we compare lab-format, package version and timestamp. With the changes in lab-refactor branch, the lab always supports multiple versions of the same package, thus the package version comparision is a no-op. As I understand it, the timestamp is there to make lintian "re-unpack" the package if it changed since the last run. Currently it completely removes the entry if the timestamp does not "match". Though this code only makes sense for "personal" static labs - on the lintian.d.o case, the version of a package can not be reused (at least not in general). The timestamp-part is not in the lab-refactor branch (yet?). I am considering to replace the "Lab-format" value with an "entry-format-version". Not sure it makes sense, but I thinking it may make migration to newer formats easier. If I had not (ab)used the oppertunity to do optimizations in the .lintian-status file (see below), the migration from the current to the lab-format would basically just have been a bunch of "mv X Y" + updating info/*. Finally, I have added a "Collections" entry to the .lintian-status file. This is used to keep track of which collections have been run and removes the need for ".$coll-$ver" files. This will reduce our (expected) file-creation from 18 to 1 per binary package[4]. For a full mirror run 18 files per binary package roughly translate to 630 000 files[5]. The udeb and sources we go from 10 and 8 to 1. So to sum it up: I am repurposing $lab/info/* files to be a manifest of what is in the lab (rather than what is on the mirror). I am breaking "dist" search and suggest we create a separate frontend for archive-checks that supports "dist" search. I am considering to add a metadata file in $lab/info/ to include stuff like "Lab format" version. I have removed data from the (per-entry) .lintian-status files. The (per-entry) ".$coll-$ver" files will be removed and the .lintian-status file will track those. Any comments? If not I will (hopefully) get the branch ready to be merged into master within 2-3 weeks - so if you have not reviewed the branch yet, now would be a good time to start. :) ~Niels [1] That would be single package checks: lintian $pkg but also simple static-lab usage lintian --lab $lab --setup-static lintian --lab $lab --unpack $pkg[,..., $pkgN] lintian --lab $lab -r $pkg[,..., $pkgN] etc. [2] The "check packages from mirror" search, i.e. lintian --lab $lab $pkg[,...,$pkgN] will first check the mirror and then fallback to the lab. I suggest we only check the lab in this case. [3] http://lists.debian.org/debian-lint-maint/2011/08/msg00285.html [4] 17 binary collections + 1 lintian status file. [5] Assumes 35 000 binary packages. Though currently "only" 576 000 files are created due to the file system limitations (~32 000 binary packages). -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org