Hi! On Tue, 2013-05-14 at 09:30:45 +0200, Helmut Grohne wrote: > I acknowledge that I am coming late to the party. I dug into the > discussion referenced from your other mail, but had a hard time finding > specific arguments. This discussion appears to be a good candidate for > http://wiki.debian.org/Debate even though it is probably too late to > start that now.
(Personally I think I'd rather summarize all pros and cons of the different possible proposals in a single place, though.) I referenced (in <20130513164531.ga18...@gaara.hadrons.org>) a mail with a list of some of the advantages: [adv] <http://lists.debian.org/debian-devel/2012/06/msg00314.html> Something not there, is that I keep finding our handling of when to ship copyright files (and changelog to a lesser degree) a bit suspect. We supposedly want all binary packages to ship copyright info, and have refactored extremely common licenses into base-files, but still some binary packages (from the same source) are not self contained, and require a Depended package to be present to provide the required copyright file, through the usually problematic symlinked doc dir, just to avoid the duplication. > On Mon, May 13, 2013 at 03:16:57PM +0200, Guillem Jover wrote: > > dpkg supports --control-show and --control-list (already in wheezy), which > > can be used for stuff like: > > > > $ dpkg-query --control-show dpkg changelog > > > > for installed packages, for example. Or: > > > > $ dpkg-deb --info foo.deb changelog > > Maybe someone can point me to previous discussion answering aspects of > the following questions? > > 1) Raphael Hertzog suggested[1] that metadata could be stored > compressed. Is that implemented already? (As far as I can see it > would be part of file_show, but isn't.) If not, that would cause > an increase in installation size. I guess that a typical desktop > system would grow by about 50MB. Already listed in [adv], also there that they could be de-duped (so, yes Michael :). None of the possible db optimizations have been implemented because they would mostly benefit if these new files are installed in the dpkg db, so doing the work when it was not yet clear if it would be used seemed a possible waste of time, or unnecessary added complexity. Also one of the advantages of having this in the dpkg db, is that these improvements can be deployed transparently and incrementally, as they'd be an internal implementation detail. Checking a currently installed sid system, I get 5 files that repeat themselves more than 10 times: $ md5sum /var/lib/dpkg/info/* | sort | uniq -c -d -w 32 | \ sort -n | grep -E '^ *[0-9]{2,} ' 13 ffed98e3b35997a540e63501bd575415 60 b8d01f7a8639f5710427ec1aca71c2df 74 574b713906c216aa174737c0322d1b4b 709 5d0e769df33e016b3c52a0971e8d258c 734 dfdc3bad88b6e98080891d6323e2f58e Which currently would save close to 6 MiB (assuming files taking 4 KiB block sizes), otherwise with a more fine-grained filesystem they'd take near 200 KiB, both quite insignificant. > Note that the Emdebian crush policy requires copyright files to be > compressed and changelogs to be absent. These files being missing, should never be an issue. I can also see allowing to configure the compression either when building dpkg, at runtime, or both, depending on what we see makes more sense. > 2) Some users may want to save disk space by elevating dpkg.cfg > path-exclude=/usr/share/doc/*/changelog* > path-exclude=/usr/share/doc/*/copyright > This saves about 50MB on a desktop system. Is there a feature to > systematically drop meta data? Being in the ball park of less than a > percent of the installation size I am not sure whether this is worth > the effort. Ansgar asked the same, and yeah I'm not sure that's worth it either, less so once de-dup and compression might be in place, and what I've usually seen is for people discarding the doc dir except for the copyright and changelog files. But I'd be very open to consider adding options to drop these at install time if we see it does make a difference or makes sense in places where it was possible before, because I can see this could be considered a regression. Thanks, Guillem -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130515053251.ga1...@gaara.hadrons.org