We have currently 10371 ChangeLog files, > 25 MB totally .
1365 == 13% with size >= 4096 B, 12 MB totally

rsync from "emerge --sync" has "--whole-file" between its options, that
mean transfer the whole file if changed.

To make things worse the bigger ChangeLogs are (oh, surprise) those that
change frequently (also the age count), following there is the list of
first 25:

110062  ./x11-base/xorg-x11/ChangeLog
96906   ./sys-devel/gcc/ChangeLog
86916   ./sys-libs/glibc/ChangeLog
72429   ./net-www/apache/ChangeLog
65331   ./sys-apps/baselayout/ChangeLog
61801   ./media-video/mplayer/ChangeLog
57688   ./dev-db/mysql/ChangeLog
53938   ./sys-kernel/gentoo-sources/ChangeLog
53810   ./net-im/gaim/ChangeLog
53173   ./www-client/mozilla/ChangeLog
51891   ./dev-php/mod_php/ChangeLog
48127   ./dev-db/postgresql/ChangeLog
47014   ./sys-devel/binutils/ChangeLog
46742   ./kde-base/kdelibs/ChangeLog
45370   ./dev-lang/perl/ChangeLog
44998   ./sys-kernel/mm-sources/ChangeLog
41010   ./kde-base/kdebase/ChangeLog
37644   ./www-client/mozilla-firefox/ChangeLog
37524   ./net-fs/samba/ChangeLog
36411   ./mail-mta/postfix/ChangeLog
35269   ./app-office/openoffice-ximian/ChangeLog
34890   ./app-office/openoffice/ChangeLog
34773   ./sys-kernel/mips-sources/ChangeLog
33245   ./media-sound/xmms/ChangeLog
32769   ./dev-util/subversion/ChangeLog


The information contained in the ChangeLogs is essential, and it must be
kept, but, force the users to download all that data it's not optimal.

That said I can see only two ways to reduce the ChangeLog files (a
centralized one is obviously not viable)

1) bzip2 them in some way.

   Pros:
   - whole story is avaiable
   - much lower file transfer size
   Cons:
   - grepping them need dedicated tools
   - never ending cvs issues, may be needed an alternate non cvs tree
   - vim already has a wrapper to read bz2 files, other editors does not
   - also 247 bytes file will need to be zipped ?

2) "rotate" Changelogs, keeping only the last changes, until a size
   of 4000 or [choose a preferred size here] bytes.
   This would save only about 7Mb of data (max size < 4096).

   Pros:
   - still easily readable and parseable
   - save download of data in the right points
   - affect only 13% of the actual tree
   Cons:
   - need changes in repoman/echangelog to cut the ChangeLog in the
     right position
   - grepping of ChangeLog impossible (for cutted data)
   - ChangeLog cutted at _no_ definite point in time (maybe yesterday)
   - whole history only on viewcvs or with acks like a
     sys-apps/ChangeLogs package

Thoughts ? It's doable in some way ?
-- 
gentoo-dev@gentoo.org mailing list

Reply via email to