On Sun, Jul 24, 2016 at 01:35:13PM +0200, Eduard Bloch wrote: > Hallo, > * Julian Andres Klode [Sun, Jul 24 2016, 01:24:12PM]: > > Control: tag -1 moreinfo > > > > On Sun, Jul 24, 2016 at 12:43:23PM +0200, Eduard Bloch wrote: > > > Package: apt > > > Version: 1.3~pre2 > > > Severity: minor > > > > > > Hello, > > > > > > since Contents file handling has been added recently, the processing of > > > them seems to be very slow. It takes about two minutes (guessed, not > > > measured) where all other stuff is done within the first ~10 seconds. > > > > > > <first analysis> > > > I think, the basic problem here is the massive size of the data in the > > > Index files - they are already big and compression ratio is very high. > > > Uncompressed versions of both amd64 and i386 add up to about one > > > gigabyte! OTOH when I zcat them both, it takes just about 5 seconds! > > > So I guess the problem is the amount of data that needs to be rotated > > > while patching the code. > > > I measured a bit how ed performs and it takes about 11 seconds for > > > Contents-amd64.gz (and about 166k of patch lines in a combined patch). > > > Patch was made before from the series of related pdiff files, of course. > > > > Not sure what is happening at your side, but APT should normally store > > Contents files using LZ4 compression, not gzip; unless you force it to > > do otherwise. > > Hm? It's the first time I come in touch with LZ4 and have no idea what > you mean.
We introduced LZ4 support in 6 months ago, on January 15. Maybe the dynamic recompression code fails to recompress your gzip files when applying pdiffs to it (it should read a .gz, and write out a patched .lz4; and later then read .lz4 and write .lz4, see end of email). What you can try is (1) make a backup of lists/, delete Contents.gz files in there, and run update again -> you should now get Contents.lz4 files in the lists dir. > > > We specifically switched to LZ4 to solve this issue. > > > > Does your system not use .lz4 compressed Contents files? > > > > > APT::Compressor::lz4::Binary "false"; > > > > My system says: > > > > APT::Compressor::lz4::Binary "lz4"; > > Shall I change it and report back in a couple of days? That should not make a difference, as we use the library anyway (which we depend on). > > But anyhow, I am wondering... the obvious guess is taht the problem is > the complexity (CPU time or memory) and not IO; how is extra compression > supposed to fix it? IMHO it would rather make it worse. > On initial download, we decompress the gzip file and recompress it with lz4. This obviously is a bit slower than just writing the gzip compressed file as it. The speed up comes with pdiff: Before 1.2, we would read the .gz file, apply any patches and write it to a gzip compressing output stream. The output stream now uses LZ4 all the time (and input uses lz4 too once the lz4 recompressed file exists). Compressing with LZ4 instead of gzip results in a signifcant speedup (10x-100x or something, I'm not really that sure). -- Debian Developer - deb.li/jak | jak-linux.org - free software dev When replying, only quote what is necessary, and write each reply directly below the part(s) it pertains to (`inline'). Thank you.

