Re: big Packages.gz file

2001-01-10 Thread Goswin Brederlow
> " " == Brian May <[EMAIL PROTECTED]> writes:

> "zhaoway" == zhaoway  <[EMAIL PROTECTED]> writes:
zhaoway> This is only a small part of the whole story, IMHO. See
zhaoway> my other email replying you. ;)

>>> Maybe there could be another version of Packages.gz without
>>> the extended descriptions -- I imagine they would take
>>> something like 33% of the Packages file, in line count at
>>> least.

zhaoway> Exactly. DIFF or RSYNC method of APT (as Goswin pointed
zhaoway> out), or just seperate Descriptions out (as I pointed out
zhaoway> and you got it too), nearly 66% of the bits are
zhaoway> saved. But this is only a hack, albeit efficient.

 > At the risk of getting flamed, I investigated the possibility
 > of writing an apt-get method to support rsync. I would use this
 > to access an already existing private mirror, and not the main
 > Debian archive. Hence the server load issue is not a
 > problem. The only problem I have is downloading several megs of
 > index files every time I want to install a new package (often
 > under 100kb) from unstable, over a volume charged 28.8 kbps PPP
 > link, using apt-get[1].

I tried the same, but I used the copy method as template, which is
rather bad. Should have used http as starting point.

Can you send me your patch please.

 > I think (if I understand correctly) that I found three problems
 > with the design of apt-get:

 > 1. It tries to down-load the compressed Packages file, and has
 > no way to override it with the uncompressed file. I filed a bug
 > report against apt-get on this, as I believe this will also be
 > a problem with protocols like rproxy too.

 > 2. apt-get tries to be smart and passes the method a
 > destination file name that is only a temporary file, and not
 > the final file. Hence, rsync cannot make a comparison between
 > local and remote versions of the file.

I wrote to the deity mailinglist concerning those two problems with 2
possible sollution. Till now the only answere I got was "NO we don't
want rsync" after pressing the issue here on debian-devel.

 > 3. Instead, rsync creates its own temporary file while
 > downloading, so apt-get cannot display the progress of the
 > download operation because as far as it is concerned the
 > destination file is still empty.

Hmm, isn't there a informational message you can output to hint of the
progress? We would have to patch rsync to generate that style of
progress output or fork and parse the output of rsync and pass on
altered output.

 > I think the only way to fix both 2 and 3 is to allow some
 > coordination between apt-get and rsync where to put the
 > temporary file and where to find the previous version of the
 > file.

Doing some more thinking I like the second solution to the problem
more and more:

1. Include a template (some file that apt-get thinks matches best) in
the fetch request. The rsync method can then copy that file to the
destination and rsync on it. This would be the uncompressed Packages
file or a previous deb or the old source.

2. return wheather the file is compressed or not simply by passing
back the destination filename with the appropriate extension (.gz). So
the destination filename is altered to reflect the fileformat.

MfG
Goswin




Re: big Packages.gz file

2001-01-09 Thread Brian May
> "Brian" == Brian May <[EMAIL PROTECTED]> writes:

Brian> Note: [1] Normally I try to find the files manually via
Brian> lynx, but right at the moment this is rather difficult, as
Brian> I seem to try numerous directories but not get the expected
Brian> result. Some packages 

Damm - sent that message before I had finished typing :-(

Anyway, I meant to say "some packages are hard to find manually while
they haven't all been moved to the package pool system yet".
-- 
Brian May <[EMAIL PROTECTED]>




Re: big Packages.gz file

2001-01-09 Thread Brian May
> "zhaoway" == zhaoway  <[EMAIL PROTECTED]> writes:

zhaoway> This is only a small part of the whole story, IMHO. See
zhaoway> my other email replying you. ;)

>> Maybe there could be another version of Packages.gz without the
>> extended descriptions -- I imagine they would take something
>> like 33% of the Packages file, in line count at least.

zhaoway> Exactly. DIFF or RSYNC method of APT (as Goswin pointed
zhaoway> out), or just seperate Descriptions out (as I pointed out
zhaoway> and you got it too), nearly 66% of the bits are
zhaoway> saved. But this is only a hack, albeit efficient.

At the risk of getting flamed, I investigated the possibility of
writing an apt-get method to support rsync. I would use this to access
an already existing private mirror, and not the main Debian
archive. Hence the server load issue is not a problem. The only
problem I have is downloading several megs of index files every time I
want to install a new package (often under 100kb) from unstable, over
a volume charged 28.8 kbps PPP link, using apt-get[1].

I think (if I understand correctly) that I found three problems with
the design of apt-get:

1. It tries to down-load the compressed Packages file, and has no way
to override it with the uncompressed file. I filed a bug report
against apt-get on this, as I believe this will also
be a problem with protocols like rproxy too.

2. apt-get tries to be smart and passes the method a destination file
name that is only a temporary file, and not the final file. Hence,
rsync cannot make a comparison between local and remote versions of
the file.

3. Instead, rsync creates its own temporary file while downloading, so
apt-get cannot display the progress of the download operation because
as far as it is concerned the destination file is still empty.

I think the only way to fix both 2 and 3 is to allow some coordination
between apt-get and rsync where to put the temporary file and where to
find the previous version of the file.

Note:
[1] Normally I try to find the files manually via lynx, but right
at the moment this is rather difficult, as I seem to try numerous
directories but not get the expected result. Some packages
-- 
Brian May <[EMAIL PROTECTED]>




Re: big Packages.gz file

2001-01-09 Thread Goswin Brederlow
> " " == Brian May <[EMAIL PROTECTED]> writes:

> "sluncho" == sluncho  <[EMAIL PROTECTED]> writes:
sluncho> How hard would it be to make daily diffs of the Package
sluncho> file? Most people running unstable update every other day
sluncho> and this will require downloading and applying only a
sluncho> couple of diff files.

sluncho> The whole process can be easily automated.

 > Sounds remarkably like the process (weekly not daily though) to
 > distribute Fidonet nodelist diffs. Also similar to kernel
 > diffs, I guess to.

 > Seems a good idea to me (until better solutions like rproxy are
 > better implemented), but you have to be careful not to get
 > apply diffs in the wrong order.  -- Brian May <[EMAIL PROTECTED]>

Or missing one or having a corrupted file to begin with or any other
of 1000 possibilities.

Also mirrors will allways lack behind, have erratic timestamping on
those files and so on. I think it would become a mess pretty soon.

The nice thing about rsync is that its self repairing. Its allso more
efficient than a normal diff.

MfG
Goswin




Re: big Packages.gz file

2001-01-09 Thread Brian May
> "sluncho" == sluncho  <[EMAIL PROTECTED]> writes:

sluncho> How hard would it be to make daily diffs of the Package
sluncho> file? Most people running unstable update every other day
sluncho> and this will require downloading and applying only a
sluncho> couple of diff files.

sluncho> The whole process can be easily automated.

Sounds remarkably like the process (weekly not daily though) to
distribute Fidonet nodelist diffs. Also similar to kernel diffs, I
guess to.

Seems a good idea to me (until better solutions like rproxy are better
implemented), but you have to be careful not to get apply diffs in the
wrong order.
-- 
Brian May <[EMAIL PROTECTED]>




Re: big Packages.gz file

2001-01-09 Thread sluncho
On Tue, Jan 09, 2001 at 11:40:01PM +1100, Hamish Moffatt wrote:
> On Tue, Jan 09, 2001 at 06:04:58PM +0900, Miles Bader wrote:
> > Hamish Moffatt <[EMAIL PROTECTED]> writes:
> > > What is the real problem with the large package files? They take a long
> > > time to download, but so do emacs and other bloatware.

> > The packages file gets downloaded _every single time_ you do an update,
> > and for those of us with a slow modem link, that really sucks.
> 
> True enough. I haven't really been following the discussion, to be honest.
> 
> Maybe there could be another version of Packages.gz without the
> extended descriptions -- I imagine they would take something like
> 33% of the Packages file, in line count at least.

Please excuse me if I am jumping into the discussion unprepared or if
this has already been mentioned.

How hard would it be to make daily diffs of the Package file? Most people
running unstable update every other day and this will require downloading
and applying only a couple of diff files.

The whole process can be easily automated.

Sluncho <[EMAIL PROTECTED]>




Re: big Packages.gz file

2001-01-09 Thread zhaoway
From: Hamish Moffatt <[EMAIL PROTECTED]>
Subject: Re: big Packages.gz file
Date: Tue, 9 Jan 2001 19:59:13 +1100

> On Tue, Jan 09, 2001 at 03:04:10AM +0800, zhaoway wrote:
> > A big package index IMHO is the current bottleneck of Debian package system.
> 
> What is the real problem with the large package files? They take a long
> time to download, but so do emacs and other bloatware.

The problem is, IMHO, that is, ;)

Every awhile, when you want to update a package to the newest version,
you have to update the package index first. And that is not absolutely
necessary if you look into this problem. And the size of package index
is constantly growing.

With Emacs, nearly all of the bits are necessary for the
functionality, and you don't download it for evey trivial update
tasks. And it is not as rapidly growing in size as package index is.

To look further, if we allow translation of Packages index, it could
be even bigger. Or we allow multiple versions of a package come into
Package pool (as Manoj had mentioned in another thread), big Package
index could be even more troublesome.

Hope I make myself clearer. ;) And thank you for discuss with me! ;)

--
echo < */
EOF




Re: big Packages.gz file

2001-01-09 Thread zhaoway
From: Hamish Moffatt <[EMAIL PROTECTED]>
Subject: Re: big Packages.gz file
Date: Tue, 9 Jan 2001 23:40:01 +1100

> On Tue, Jan 09, 2001 at 06:04:58PM +0900, Miles Bader wrote:
> > The packages file gets downloaded _every single time_ you do an update,
> > and for those of us with a slow modem link, that really sucks.

This is only a small part of the whole story, IMHO. See my other email
replying you. ;)

> Maybe there could be another version of Packages.gz without the
> extended descriptions -- I imagine they would take something like
> 33% of the Packages file, in line count at least.

Exactly. DIFF or RSYNC method of APT (as Goswin pointed out), or just
seperate Descriptions out (as I pointed out and you got it too),
nearly 66% of the bits are saved. But this is only a hack, albeit
efficient.

Cause this does not solve the problem of the package pool within the
package pool system. It does it on the protocol and client tool side.

1) AIUI, package pool should be a storage system, which should has a
smart algorithm for deleting packages which no distribution or other
packages referncing. (Garbage collection by reference counts.)

2) A distribution, put aside the work of our honoured release manager,
should be a partial package index listing. Thus, should be seperated
from storage system. The current ``testing'' distribution doesn't to
it well enough. (Thus, it has a regulation on upload frequency.)

With these two things in mind, RSYNC can help very little. And the
package pool's indexing problem remains. While on my previous letters,
I try to get out a discussion on one of my humble try to help. ;)

As soon as I have enough time, and enough discussion, I maybe write a
more prepared document. But I need discussion first. Thanks!

--
echo < */
EOF




Re: big Packages.gz file

2001-01-09 Thread Hamish Moffatt
On Tue, Jan 09, 2001 at 06:04:58PM +0900, Miles Bader wrote:
> Hamish Moffatt <[EMAIL PROTECTED]> writes:
> > What is the real problem with the large package files? They take a long
> > time to download, but so do emacs and other bloatware.
> 
> Yeah, but how often do you download emacs?

Never, I wouldn't touch that thing with a 40 foot barge pole!

> The packages file gets downloaded _every single time_ you do an update,
> and for those of us with a slow modem link, that really sucks.

True enough. I haven't really been following the discussion, to be honest.

Maybe there could be another version of Packages.gz without the
extended descriptions -- I imagine they would take something like
33% of the Packages file, in line count at least.


Hamish
-- 
Hamish Moffatt VK3SB <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>




Re: big Packages.gz file

2001-01-09 Thread Miles Bader
Hamish Moffatt <[EMAIL PROTECTED]> writes:
> What is the real problem with the large package files? They take a long
> time to download, but so do emacs and other bloatware.

Yeah, but how often do you download emacs?

The packages file gets downloaded _every single time_ you do an update,
and for those of us with a slow modem link, that really sucks.

-Miles
-- 
Love is a snowmobile racing across the tundra.  Suddenly it flips over,
pinning you underneath.  At night the ice weasels come.  --Nietzsche




Re: big Packages.gz file

2001-01-09 Thread Hamish Moffatt
On Tue, Jan 09, 2001 at 03:04:10AM +0800, zhaoway wrote:
> A big package index IMHO is the current bottleneck of Debian package system.

What is the real problem with the large package files? They take a long
time to download, but so do emacs and other bloatware.


Hamish
-- 
Hamish Moffatt VK3SB <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>




Re: Linux Gazette [Was: Re: big Packages.gz file]

2001-01-08 Thread Adrian Bridgett
On Mon, Jan  8, 2001 at 18:20:16 +0100 (+), Andreas Fuchs wrote:
> On 2001-01-07, Goswin Brederlow
> <[EMAIL PROTECTED]> wrote:
> > zhaoway> 1) It prevent many more packages to come into Debian, for
> > zhaoway> example, Linux Gazette are now not present newest issues
> > zhaoway> in Debian. People occasionally got fucked up by packages
> 
> > Any reasons why the Linux gazette is not present anymore?
> > And is there a virtual package for the Linux gazette that allays
> > depends on the newest version?
> 
> Another solution would be to have only an installer which installs the
> latest version of the LG from a server that keeps it. Keeps the
> Packages.gz file clean, and LG readers happy.
> 
> Or am I missing something?

To answer the questions:

a) it is present but I havn't updated it in a while (busy).  Wouter Verhelst
has offered to take over the package but he's new to packaging so things are
taking a bit of time.

b) nope - I havn't done a virtual "latest" package yet, there is a bug about
it I think (or Wouter suggested it).

c) personally, I like the LG since I find the issues useful - I found useful
articles in all the ones I read.  Unfortuantely since I left uni I havn't
been sufficiently bored to remember to download and read them (and hence to
package them).

d) I was hoping the "data" section of Debian would get into policy so I
could move the packages there and out of main.

Adrian

Email: [EMAIL PROTECTED]
Windows NT - Unix in beta-testing. GPG/PGP keys available on public key servers
Debian GNU/Linux  -*-  By professionals for professionals  -*-  www.debian.org




Re: big Packages.gz file

2001-01-08 Thread calvin
Hello,

On Tue, Jan 09, 2001 at 03:04:10AM +0800, zhaoway wrote:
> * To seperate Packages.gz to be along with each package as another seperate
>   file. Ceazar's belong to Ceazar. ;)
>   i.e., each pkg_ver-sub_arch.deb with a pkg_ver-sub_arch.idx
No, thats not a win. You would end up checking time stamps for thousands
of files in case of an update.
I liked the idea of alphabetical splitting in Packages-[a-z0-9].gz

> * At the same time, provide a big Packages.gz by collecting these small
>   files for compatibility. Or, maybe even a trimmed Packages.gz by removing
>   all of the Description:s.
Jup, just keep a copy of Packages.gz and provide backwards compatibility.

Bastian Kleineidam


pgp0mckdUDTPq.pgp
Description: PGP signature


Re: big Packages.gz file

2001-01-08 Thread zhaoway
On Sun, Jan 07, 2001 at 05:18:02PM -0500, Chris Gray wrote:
> > Brian May writes:
> bm> What do large packages have to do with the size of the index file,
> bm> Packages?
> 
> I think the point was that every package adds about 30-45 lines to the
> Packages file.  You don't need to download any of the Linux Gazette to
> have the 33 lines each issue takes up in the Packages file.

A big package index IMHO is the current bottleneck of Debian package system.
While most of people are more interested in RSYNC to come to cure, MHO RSYNC
is an overkill and a non-clean-kill. It prevents easy mirroring of Debian by
requesting RSYNC service on the mirror system, and it won't solve the pool's
problem, but give a hack. ;)

While OTOH a relatively straight solution is:

* To seperate Packages.gz to be along with each package as another seperate
  file. Ceazar's belong to Ceazar. ;)
  i.e., each pkg_ver-sub_arch.deb with a pkg_ver-sub_arch.idx
* At the same time, provide a big Packages.gz by collecting these small
  files for compatibility. Or, maybe even a trimmed Packages.gz by removing
  all of the Description:s.
* Optionally, provide hard or symlinks along with each package, some
  i.e., pkg_[stable|unstable|testing]_arch.idx -> pkg_ver-sub_arch.idx
  Note: this won't hurt mirror, OTOH could even help partial mirror.
* And enable multiple versions of a package in the package pool.

This way, general package index is optional. And release management could
move towards those more fine tuned task-* like packages. No lost. ;)

Just for discussion, I would be glad to hear critics. ;)

-- 
echo < */
EOF




Re: Linux Gazette [Was: Re: big Packages.gz file]

2001-01-08 Thread Andreas Fuchs
On 2001-01-07, Goswin Brederlow
<[EMAIL PROTECTED]> wrote:
> zhaoway> 1) It prevent many more packages to come into Debian, for
> zhaoway> example, Linux Gazette are now not present newest issues
> zhaoway> in Debian. People occasionally got fucked up by packages

> Any reasons why the Linux gazette is not present anymore?
> And is there a virtual package for the Linux gazette that allays
> depends on the newest version?

Another solution would be to have only an installer which installs the
latest version of the LG from a server that keeps it. Keeps the
Packages.gz file clean, and LG readers happy.

Or am I missing something?

-- 
Andreas Fuchs, <[EMAIL PROTECTED]>, <[EMAIL PROTECTED]>, antifuchs
Hail RMS! Hail Cthulhu! Hail Eris! All hail Discordia!




Linux Gazette [Was: Re: big Packages.gz file]

2001-01-07 Thread Goswin Brederlow
> " " == Chris Gray <[EMAIL PROTECTED]> writes:

> Brian May writes:
> "zhaoway" == zhaoway  <[EMAIL PROTECTED]> writes:
zhaoway> 1) It prevent many more packages to come into Debian, for
zhaoway> example, Linux Gazette are now not present newest issues
zhaoway> in Debian. People occasionally got fucked up by packages

Any reasons why the Linux gazette is not present anymore?

And is there a virtual package for the Linux gazette that allays
depends on the newest version?

MfG
Goswin




Re: big Packages.gz file

2001-01-07 Thread Chris Gray
> Brian May writes:

> "zhaoway" == zhaoway  <[EMAIL PROTECTED]> writes:
zhaoway> 1) It prevent many more packages to come into Debian, for
zhaoway> example, Linux Gazette are now not present newest issues
zhaoway> in Debian. People occasionally got fucked up by packages
zhaoway> like anachism-doc because the precious band-width. And
zhaoway> some occasional discussion on L10N packages to distrub
zhaoway> others life who don't need it.

bm> ...only if you download and install the package in question.

bm> What do large packages have to do with the size of the index file,
bm> Packages?

I think the point was that every package adds about 30-45 lines to the
Packages file.  You don't need to download any of the Linux Gazette to
have the 33 lines each issue takes up in the Packages file.

Cheers,
Chris

-- 
Got jag?  http://www.tribsoft.com




Re: big Packages.gz file

2001-01-06 Thread Sam Couter
> On 2001-01-05, Brian May <[EMAIL PROTECTED]> wrote:
> > What do large packages have to do with the size of the index file,
> > Packages?

Andreas Fuchs <[EMAIL PROTECTED]> wrote:
> They waste one byte per multiple of 10 bytes of package size. (-;

You mean one byte per order of magnitude of package size. ;)

> Bad joke? So sue me.

Yes, very bad. I couldn't resist correcting, which makes me at least as bad.
-- 
Sam Couter  |   Internet Engineer   |   http://www.topic.com.au/
[EMAIL PROTECTED]|   tSA Consulting  |
OpenPGP key available on key servers
OpenPGP fingerprint:  A46B 9BB5 3148 7BEA 1F05  5BD5 8530 03AE DE89 C75C


pgpSGNJSoIRqT.pgp
Description: PGP signature


Re: big Packages.gz file

2001-01-06 Thread Andreas Fuchs
On 2001-01-05, Brian May <[EMAIL PROTECTED]> wrote:
> What do large packages have to do with the size of the index file,
> Packages?

They waste one byte per multiple of 10 bytes of package size. (-;

Bad joke? So sue me.
-- 
Andreas Fuchs, <[EMAIL PROTECTED]>, <[EMAIL PROTECTED]>, antifuchs
Hail RMS! Hail Cthulhu! Hail Eris! All hail Discordia!




Re: big Packages.gz file

2001-01-05 Thread Brian May
> "zhaoway" == zhaoway  <[EMAIL PROTECTED]> writes:

zhaoway> 1) It prevent many more packages to come into Debian, for
zhaoway> example, Linux Gazette are now not present newest issues
zhaoway> in Debian. People occasionally got fucked up by packages
zhaoway> like anachism-doc because the precious band-width. And
zhaoway> some occasional discussion on L10N packages to distrub
zhaoway> others life who don't need it.

...only if you download and install the package in question.

What do large packages have to do with the size of the index file,
Packages?

zhaoway> 2) They have a FIX TIME problem. I.e., if you don't RSYNC
zhaoway> or DIFF for a long time, they won't save you extra
zhaoway> bandwidth. While my approach do.

You only download what has changed. Nothing more, nothing less.

I could equally argue, if you wait a while, then exactly one package
in each section will change, causing you to have to re-download all
Index files.

I am not trying to argue that your method is a bad idea, but please
try and get your facts straight first.


Now back on topic: another similar alternative to rsync might be
protocols like rproxy, which add rsync capabilities to
HTTP. Apparently the authors want to include functionality (not sure
what time frame they are talking about here) in Squid and Apache. This
would mean rsync support in apt-get may be less important, you just
need to force it to download Packages not Packages.gz.
-- 
Brian May <[EMAIL PROTECTED]>