Re: big Packages.gz file
> " " == Brian May <[EMAIL PROTECTED]> writes: > "zhaoway" == zhaoway <[EMAIL PROTECTED]> writes: zhaoway> This is only a small part of the whole story, IMHO. See zhaoway> my other email replying you. ;) >>> Maybe there could be another version of Packages.gz without >>> the extended descriptions -- I imagine they would take >>> something like 33% of the Packages file, in line count at >>> least. zhaoway> Exactly. DIFF or RSYNC method of APT (as Goswin pointed zhaoway> out), or just seperate Descriptions out (as I pointed out zhaoway> and you got it too), nearly 66% of the bits are zhaoway> saved. But this is only a hack, albeit efficient. > At the risk of getting flamed, I investigated the possibility > of writing an apt-get method to support rsync. I would use this > to access an already existing private mirror, and not the main > Debian archive. Hence the server load issue is not a > problem. The only problem I have is downloading several megs of > index files every time I want to install a new package (often > under 100kb) from unstable, over a volume charged 28.8 kbps PPP > link, using apt-get[1]. I tried the same, but I used the copy method as template, which is rather bad. Should have used http as starting point. Can you send me your patch please. > I think (if I understand correctly) that I found three problems > with the design of apt-get: > 1. It tries to down-load the compressed Packages file, and has > no way to override it with the uncompressed file. I filed a bug > report against apt-get on this, as I believe this will also be > a problem with protocols like rproxy too. > 2. apt-get tries to be smart and passes the method a > destination file name that is only a temporary file, and not > the final file. Hence, rsync cannot make a comparison between > local and remote versions of the file. I wrote to the deity mailinglist concerning those two problems with 2 possible sollution. Till now the only answere I got was "NO we don't want rsync" after pressing the issue here on debian-devel. > 3. Instead, rsync creates its own temporary file while > downloading, so apt-get cannot display the progress of the > download operation because as far as it is concerned the > destination file is still empty. Hmm, isn't there a informational message you can output to hint of the progress? We would have to patch rsync to generate that style of progress output or fork and parse the output of rsync and pass on altered output. > I think the only way to fix both 2 and 3 is to allow some > coordination between apt-get and rsync where to put the > temporary file and where to find the previous version of the > file. Doing some more thinking I like the second solution to the problem more and more: 1. Include a template (some file that apt-get thinks matches best) in the fetch request. The rsync method can then copy that file to the destination and rsync on it. This would be the uncompressed Packages file or a previous deb or the old source. 2. return wheather the file is compressed or not simply by passing back the destination filename with the appropriate extension (.gz). So the destination filename is altered to reflect the fileformat. MfG Goswin
Re: big Packages.gz file
> "Brian" == Brian May <[EMAIL PROTECTED]> writes: Brian> Note: [1] Normally I try to find the files manually via Brian> lynx, but right at the moment this is rather difficult, as Brian> I seem to try numerous directories but not get the expected Brian> result. Some packages Damm - sent that message before I had finished typing :-( Anyway, I meant to say "some packages are hard to find manually while they haven't all been moved to the package pool system yet". -- Brian May <[EMAIL PROTECTED]>
Re: big Packages.gz file
> "zhaoway" == zhaoway <[EMAIL PROTECTED]> writes: zhaoway> This is only a small part of the whole story, IMHO. See zhaoway> my other email replying you. ;) >> Maybe there could be another version of Packages.gz without the >> extended descriptions -- I imagine they would take something >> like 33% of the Packages file, in line count at least. zhaoway> Exactly. DIFF or RSYNC method of APT (as Goswin pointed zhaoway> out), or just seperate Descriptions out (as I pointed out zhaoway> and you got it too), nearly 66% of the bits are zhaoway> saved. But this is only a hack, albeit efficient. At the risk of getting flamed, I investigated the possibility of writing an apt-get method to support rsync. I would use this to access an already existing private mirror, and not the main Debian archive. Hence the server load issue is not a problem. The only problem I have is downloading several megs of index files every time I want to install a new package (often under 100kb) from unstable, over a volume charged 28.8 kbps PPP link, using apt-get[1]. I think (if I understand correctly) that I found three problems with the design of apt-get: 1. It tries to down-load the compressed Packages file, and has no way to override it with the uncompressed file. I filed a bug report against apt-get on this, as I believe this will also be a problem with protocols like rproxy too. 2. apt-get tries to be smart and passes the method a destination file name that is only a temporary file, and not the final file. Hence, rsync cannot make a comparison between local and remote versions of the file. 3. Instead, rsync creates its own temporary file while downloading, so apt-get cannot display the progress of the download operation because as far as it is concerned the destination file is still empty. I think the only way to fix both 2 and 3 is to allow some coordination between apt-get and rsync where to put the temporary file and where to find the previous version of the file. Note: [1] Normally I try to find the files manually via lynx, but right at the moment this is rather difficult, as I seem to try numerous directories but not get the expected result. Some packages -- Brian May <[EMAIL PROTECTED]>
Re: big Packages.gz file
> " " == Brian May <[EMAIL PROTECTED]> writes: > "sluncho" == sluncho <[EMAIL PROTECTED]> writes: sluncho> How hard would it be to make daily diffs of the Package sluncho> file? Most people running unstable update every other day sluncho> and this will require downloading and applying only a sluncho> couple of diff files. sluncho> The whole process can be easily automated. > Sounds remarkably like the process (weekly not daily though) to > distribute Fidonet nodelist diffs. Also similar to kernel > diffs, I guess to. > Seems a good idea to me (until better solutions like rproxy are > better implemented), but you have to be careful not to get > apply diffs in the wrong order. -- Brian May <[EMAIL PROTECTED]> Or missing one or having a corrupted file to begin with or any other of 1000 possibilities. Also mirrors will allways lack behind, have erratic timestamping on those files and so on. I think it would become a mess pretty soon. The nice thing about rsync is that its self repairing. Its allso more efficient than a normal diff. MfG Goswin
Re: big Packages.gz file
> "sluncho" == sluncho <[EMAIL PROTECTED]> writes: sluncho> How hard would it be to make daily diffs of the Package sluncho> file? Most people running unstable update every other day sluncho> and this will require downloading and applying only a sluncho> couple of diff files. sluncho> The whole process can be easily automated. Sounds remarkably like the process (weekly not daily though) to distribute Fidonet nodelist diffs. Also similar to kernel diffs, I guess to. Seems a good idea to me (until better solutions like rproxy are better implemented), but you have to be careful not to get apply diffs in the wrong order. -- Brian May <[EMAIL PROTECTED]>
Re: big Packages.gz file
On Tue, Jan 09, 2001 at 11:40:01PM +1100, Hamish Moffatt wrote: > On Tue, Jan 09, 2001 at 06:04:58PM +0900, Miles Bader wrote: > > Hamish Moffatt <[EMAIL PROTECTED]> writes: > > > What is the real problem with the large package files? They take a long > > > time to download, but so do emacs and other bloatware. > > The packages file gets downloaded _every single time_ you do an update, > > and for those of us with a slow modem link, that really sucks. > > True enough. I haven't really been following the discussion, to be honest. > > Maybe there could be another version of Packages.gz without the > extended descriptions -- I imagine they would take something like > 33% of the Packages file, in line count at least. Please excuse me if I am jumping into the discussion unprepared or if this has already been mentioned. How hard would it be to make daily diffs of the Package file? Most people running unstable update every other day and this will require downloading and applying only a couple of diff files. The whole process can be easily automated. Sluncho <[EMAIL PROTECTED]>
Re: big Packages.gz file
From: Hamish Moffatt <[EMAIL PROTECTED]> Subject: Re: big Packages.gz file Date: Tue, 9 Jan 2001 19:59:13 +1100 > On Tue, Jan 09, 2001 at 03:04:10AM +0800, zhaoway wrote: > > A big package index IMHO is the current bottleneck of Debian package system. > > What is the real problem with the large package files? They take a long > time to download, but so do emacs and other bloatware. The problem is, IMHO, that is, ;) Every awhile, when you want to update a package to the newest version, you have to update the package index first. And that is not absolutely necessary if you look into this problem. And the size of package index is constantly growing. With Emacs, nearly all of the bits are necessary for the functionality, and you don't download it for evey trivial update tasks. And it is not as rapidly growing in size as package index is. To look further, if we allow translation of Packages index, it could be even bigger. Or we allow multiple versions of a package come into Package pool (as Manoj had mentioned in another thread), big Package index could be even more troublesome. Hope I make myself clearer. ;) And thank you for discuss with me! ;) -- echo < */ EOF
Re: big Packages.gz file
From: Hamish Moffatt <[EMAIL PROTECTED]> Subject: Re: big Packages.gz file Date: Tue, 9 Jan 2001 23:40:01 +1100 > On Tue, Jan 09, 2001 at 06:04:58PM +0900, Miles Bader wrote: > > The packages file gets downloaded _every single time_ you do an update, > > and for those of us with a slow modem link, that really sucks. This is only a small part of the whole story, IMHO. See my other email replying you. ;) > Maybe there could be another version of Packages.gz without the > extended descriptions -- I imagine they would take something like > 33% of the Packages file, in line count at least. Exactly. DIFF or RSYNC method of APT (as Goswin pointed out), or just seperate Descriptions out (as I pointed out and you got it too), nearly 66% of the bits are saved. But this is only a hack, albeit efficient. Cause this does not solve the problem of the package pool within the package pool system. It does it on the protocol and client tool side. 1) AIUI, package pool should be a storage system, which should has a smart algorithm for deleting packages which no distribution or other packages referncing. (Garbage collection by reference counts.) 2) A distribution, put aside the work of our honoured release manager, should be a partial package index listing. Thus, should be seperated from storage system. The current ``testing'' distribution doesn't to it well enough. (Thus, it has a regulation on upload frequency.) With these two things in mind, RSYNC can help very little. And the package pool's indexing problem remains. While on my previous letters, I try to get out a discussion on one of my humble try to help. ;) As soon as I have enough time, and enough discussion, I maybe write a more prepared document. But I need discussion first. Thanks! -- echo < */ EOF
Re: big Packages.gz file
On Tue, Jan 09, 2001 at 06:04:58PM +0900, Miles Bader wrote: > Hamish Moffatt <[EMAIL PROTECTED]> writes: > > What is the real problem with the large package files? They take a long > > time to download, but so do emacs and other bloatware. > > Yeah, but how often do you download emacs? Never, I wouldn't touch that thing with a 40 foot barge pole! > The packages file gets downloaded _every single time_ you do an update, > and for those of us with a slow modem link, that really sucks. True enough. I haven't really been following the discussion, to be honest. Maybe there could be another version of Packages.gz without the extended descriptions -- I imagine they would take something like 33% of the Packages file, in line count at least. Hamish -- Hamish Moffatt VK3SB <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
Re: big Packages.gz file
Hamish Moffatt <[EMAIL PROTECTED]> writes: > What is the real problem with the large package files? They take a long > time to download, but so do emacs and other bloatware. Yeah, but how often do you download emacs? The packages file gets downloaded _every single time_ you do an update, and for those of us with a slow modem link, that really sucks. -Miles -- Love is a snowmobile racing across the tundra. Suddenly it flips over, pinning you underneath. At night the ice weasels come. --Nietzsche
Re: big Packages.gz file
On Tue, Jan 09, 2001 at 03:04:10AM +0800, zhaoway wrote: > A big package index IMHO is the current bottleneck of Debian package system. What is the real problem with the large package files? They take a long time to download, but so do emacs and other bloatware. Hamish -- Hamish Moffatt VK3SB <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
Re: Linux Gazette [Was: Re: big Packages.gz file]
On Mon, Jan 8, 2001 at 18:20:16 +0100 (+), Andreas Fuchs wrote: > On 2001-01-07, Goswin Brederlow > <[EMAIL PROTECTED]> wrote: > > zhaoway> 1) It prevent many more packages to come into Debian, for > > zhaoway> example, Linux Gazette are now not present newest issues > > zhaoway> in Debian. People occasionally got fucked up by packages > > > Any reasons why the Linux gazette is not present anymore? > > And is there a virtual package for the Linux gazette that allays > > depends on the newest version? > > Another solution would be to have only an installer which installs the > latest version of the LG from a server that keeps it. Keeps the > Packages.gz file clean, and LG readers happy. > > Or am I missing something? To answer the questions: a) it is present but I havn't updated it in a while (busy). Wouter Verhelst has offered to take over the package but he's new to packaging so things are taking a bit of time. b) nope - I havn't done a virtual "latest" package yet, there is a bug about it I think (or Wouter suggested it). c) personally, I like the LG since I find the issues useful - I found useful articles in all the ones I read. Unfortuantely since I left uni I havn't been sufficiently bored to remember to download and read them (and hence to package them). d) I was hoping the "data" section of Debian would get into policy so I could move the packages there and out of main. Adrian Email: [EMAIL PROTECTED] Windows NT - Unix in beta-testing. GPG/PGP keys available on public key servers Debian GNU/Linux -*- By professionals for professionals -*- www.debian.org
Re: big Packages.gz file
Hello, On Tue, Jan 09, 2001 at 03:04:10AM +0800, zhaoway wrote: > * To seperate Packages.gz to be along with each package as another seperate > file. Ceazar's belong to Ceazar. ;) > i.e., each pkg_ver-sub_arch.deb with a pkg_ver-sub_arch.idx No, thats not a win. You would end up checking time stamps for thousands of files in case of an update. I liked the idea of alphabetical splitting in Packages-[a-z0-9].gz > * At the same time, provide a big Packages.gz by collecting these small > files for compatibility. Or, maybe even a trimmed Packages.gz by removing > all of the Description:s. Jup, just keep a copy of Packages.gz and provide backwards compatibility. Bastian Kleineidam pgp0mckdUDTPq.pgp Description: PGP signature
Re: big Packages.gz file
On Sun, Jan 07, 2001 at 05:18:02PM -0500, Chris Gray wrote: > > Brian May writes: > bm> What do large packages have to do with the size of the index file, > bm> Packages? > > I think the point was that every package adds about 30-45 lines to the > Packages file. You don't need to download any of the Linux Gazette to > have the 33 lines each issue takes up in the Packages file. A big package index IMHO is the current bottleneck of Debian package system. While most of people are more interested in RSYNC to come to cure, MHO RSYNC is an overkill and a non-clean-kill. It prevents easy mirroring of Debian by requesting RSYNC service on the mirror system, and it won't solve the pool's problem, but give a hack. ;) While OTOH a relatively straight solution is: * To seperate Packages.gz to be along with each package as another seperate file. Ceazar's belong to Ceazar. ;) i.e., each pkg_ver-sub_arch.deb with a pkg_ver-sub_arch.idx * At the same time, provide a big Packages.gz by collecting these small files for compatibility. Or, maybe even a trimmed Packages.gz by removing all of the Description:s. * Optionally, provide hard or symlinks along with each package, some i.e., pkg_[stable|unstable|testing]_arch.idx -> pkg_ver-sub_arch.idx Note: this won't hurt mirror, OTOH could even help partial mirror. * And enable multiple versions of a package in the package pool. This way, general package index is optional. And release management could move towards those more fine tuned task-* like packages. No lost. ;) Just for discussion, I would be glad to hear critics. ;) -- echo < */ EOF
Re: Linux Gazette [Was: Re: big Packages.gz file]
On 2001-01-07, Goswin Brederlow <[EMAIL PROTECTED]> wrote: > zhaoway> 1) It prevent many more packages to come into Debian, for > zhaoway> example, Linux Gazette are now not present newest issues > zhaoway> in Debian. People occasionally got fucked up by packages > Any reasons why the Linux gazette is not present anymore? > And is there a virtual package for the Linux gazette that allays > depends on the newest version? Another solution would be to have only an installer which installs the latest version of the LG from a server that keeps it. Keeps the Packages.gz file clean, and LG readers happy. Or am I missing something? -- Andreas Fuchs, <[EMAIL PROTECTED]>, <[EMAIL PROTECTED]>, antifuchs Hail RMS! Hail Cthulhu! Hail Eris! All hail Discordia!
Linux Gazette [Was: Re: big Packages.gz file]
> " " == Chris Gray <[EMAIL PROTECTED]> writes: > Brian May writes: > "zhaoway" == zhaoway <[EMAIL PROTECTED]> writes: zhaoway> 1) It prevent many more packages to come into Debian, for zhaoway> example, Linux Gazette are now not present newest issues zhaoway> in Debian. People occasionally got fucked up by packages Any reasons why the Linux gazette is not present anymore? And is there a virtual package for the Linux gazette that allays depends on the newest version? MfG Goswin
Re: big Packages.gz file
> Brian May writes: > "zhaoway" == zhaoway <[EMAIL PROTECTED]> writes: zhaoway> 1) It prevent many more packages to come into Debian, for zhaoway> example, Linux Gazette are now not present newest issues zhaoway> in Debian. People occasionally got fucked up by packages zhaoway> like anachism-doc because the precious band-width. And zhaoway> some occasional discussion on L10N packages to distrub zhaoway> others life who don't need it. bm> ...only if you download and install the package in question. bm> What do large packages have to do with the size of the index file, bm> Packages? I think the point was that every package adds about 30-45 lines to the Packages file. You don't need to download any of the Linux Gazette to have the 33 lines each issue takes up in the Packages file. Cheers, Chris -- Got jag? http://www.tribsoft.com
Re: big Packages.gz file
> On 2001-01-05, Brian May <[EMAIL PROTECTED]> wrote: > > What do large packages have to do with the size of the index file, > > Packages? Andreas Fuchs <[EMAIL PROTECTED]> wrote: > They waste one byte per multiple of 10 bytes of package size. (-; You mean one byte per order of magnitude of package size. ;) > Bad joke? So sue me. Yes, very bad. I couldn't resist correcting, which makes me at least as bad. -- Sam Couter | Internet Engineer | http://www.topic.com.au/ [EMAIL PROTECTED]| tSA Consulting | OpenPGP key available on key servers OpenPGP fingerprint: A46B 9BB5 3148 7BEA 1F05 5BD5 8530 03AE DE89 C75C pgpSGNJSoIRqT.pgp Description: PGP signature
Re: big Packages.gz file
On 2001-01-05, Brian May <[EMAIL PROTECTED]> wrote: > What do large packages have to do with the size of the index file, > Packages? They waste one byte per multiple of 10 bytes of package size. (-; Bad joke? So sue me. -- Andreas Fuchs, <[EMAIL PROTECTED]>, <[EMAIL PROTECTED]>, antifuchs Hail RMS! Hail Cthulhu! Hail Eris! All hail Discordia!
Re: big Packages.gz file
> "zhaoway" == zhaoway <[EMAIL PROTECTED]> writes: zhaoway> 1) It prevent many more packages to come into Debian, for zhaoway> example, Linux Gazette are now not present newest issues zhaoway> in Debian. People occasionally got fucked up by packages zhaoway> like anachism-doc because the precious band-width. And zhaoway> some occasional discussion on L10N packages to distrub zhaoway> others life who don't need it. ...only if you download and install the package in question. What do large packages have to do with the size of the index file, Packages? zhaoway> 2) They have a FIX TIME problem. I.e., if you don't RSYNC zhaoway> or DIFF for a long time, they won't save you extra zhaoway> bandwidth. While my approach do. You only download what has changed. Nothing more, nothing less. I could equally argue, if you wait a while, then exactly one package in each section will change, causing you to have to re-download all Index files. I am not trying to argue that your method is a bad idea, but please try and get your facts straight first. Now back on topic: another similar alternative to rsync might be protocols like rproxy, which add rsync capabilities to HTTP. Apparently the authors want to include functionality (not sure what time frame they are talking about here) in Squid and Apache. This would mean rsync support in apt-get may be less important, you just need to force it to download Packages not Packages.gz. -- Brian May <[EMAIL PROTECTED]>