Hi!

On Tue, 2013-05-14 at 09:30:45 +0200, Helmut Grohne wrote:
> I acknowledge that I am coming late to the party. I dug into the
> discussion referenced from your other mail, but had a hard time finding
> specific arguments. This discussion appears to be a good candidate for
> http://wiki.debian.org/Debate even though it is probably too late to
> start that now.

(Personally I think I'd rather summarize all pros and cons of the
different possible proposals in a single place, though.)

I referenced (in <20130513164531.ga18...@gaara.hadrons.org>) a mail with
a list of some of the advantages:

  [adv] <http://lists.debian.org/debian-devel/2012/06/msg00314.html>

Something not there, is that I keep finding our handling of when to
ship copyright files (and changelog to a lesser degree) a bit suspect.
We supposedly want all binary packages to ship copyright info, and have
refactored extremely common licenses into base-files, but still some
binary packages (from the same source) are not self contained, and
require a Depended package to be present to provide the required
copyright file, through the usually problematic symlinked doc dir,
just to avoid the duplication.

> On Mon, May 13, 2013 at 03:16:57PM +0200, Guillem Jover wrote:
> > dpkg supports --control-show and --control-list (already in wheezy), which
> > can be used for stuff like:
> > 
> >   $ dpkg-query --control-show dpkg changelog
> > 
> > for installed packages, for example. Or:
> > 
> >   $ dpkg-deb --info foo.deb changelog
> 
> Maybe someone can point me to previous discussion answering aspects of
> the following questions?
> 
> 1) Raphael Hertzog suggested[1] that metadata could be stored
>    compressed. Is that implemented already? (As far as I can see it
>    would be part of file_show, but isn't.) If not, that would cause
>    an increase in installation size. I guess that a typical desktop
>    system would grow by about 50MB.

Already listed in [adv], also there that they could be de-duped (so,
yes Michael :). None of the possible db optimizations have been
implemented because they would mostly benefit if these new files are
installed in the dpkg db, so doing the work when it was not yet clear
if it would be used seemed a possible waste of time, or unnecessary
added complexity. Also one of the advantages of having this in the
dpkg db, is that these improvements can be deployed transparently
and incrementally, as they'd be an internal implementation detail.

Checking a currently installed sid system, I get 5 files that repeat
themselves more than 10 times:

  $ md5sum /var/lib/dpkg/info/* | sort | uniq -c -d -w 32 | \
    sort -n | grep -E '^ *[0-9]{2,} '
     13 ffed98e3b35997a540e63501bd575415
     60 b8d01f7a8639f5710427ec1aca71c2df
     74 574b713906c216aa174737c0322d1b4b
    709 5d0e769df33e016b3c52a0971e8d258c
    734 dfdc3bad88b6e98080891d6323e2f58e

Which currently would save close to 6 MiB (assuming files taking 4 KiB
block sizes), otherwise with a more fine-grained filesystem they'd take
near 200 KiB, both quite insignificant.

>    Note that the Emdebian crush policy requires copyright files to be
>    compressed and changelogs to be absent.

These files being missing, should never be an issue. I can also see
allowing to configure the compression either when building dpkg, at
runtime, or both, depending on what we see makes more sense.

> 2) Some users may want to save disk space by elevating dpkg.cfg
>    path-exclude=/usr/share/doc/*/changelog*
>    path-exclude=/usr/share/doc/*/copyright
>    This saves about 50MB on a desktop system. Is there a feature to
>    systematically drop meta data? Being in the ball park of less than a
>    percent of the installation size I am not sure whether this is worth
>    the effort.

Ansgar asked the same, and yeah I'm not sure that's worth it either,
less so once de-dup and compression might be in place, and what I've
usually seen is for people discarding the doc dir except for the
copyright and changelog files. But I'd be very open to consider adding
options to drop these at install time if we see it does make a difference
or makes sense in places where it was possible before, because I can see
this could be considered a regression.

Thanks,
Guillem


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130515053251.ga1...@gaara.hadrons.org

Reply via email to