Good day.

I have been asked by Feanor to write this. Please consider it an idea, a draft. It is open for discussion, yet my personal opinion is, that we should invest into this idea.

I have addressed Landof and BBraun of the Opendarwing project, asking them whether they would be interested in addressing the public together when it comes to gathering resources for possible disft file mirrors.
BBraun answered with a rather long mail to which I responded. Since I did not shorten his mail significantly you may read His mail and my answers in the transcript below.


There are a few issues which need to be addressed, mainly a few alterations to the fink code that have to be carefully discussed. I cannot really decide on them, nor do I know what exactly is needed.

I would suggest, that either Benjamin, Drm or Max meet with landof and bbraun to discuss the actual code changes.
What I have in mind to strengthen the position of Fink AND Opendarwin is pretty simple. I would like to see how hard it is to consolidate those bits of target audience the two projects might span together. I would like to address the public in the name of Fink AND Opendarwin to collect more resources for both projects.


As this is something I am able to work on without having to concentrate on porting something or recompiling I can concentrate better on it.

Please do advise us. I have CC: landof and bbraun to keep them up to date.




--Transcript-


From: David <[EMAIL PROTECTED]>
Date: Mit Mär 19, 2003  10:44:40  Uhr Europe/Vienna
To: bbraun <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: Distfile mirroring

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


On Mittwoch, März 19, 2003, at 08:35 Uhr, bbraun wrote:



Before I answer I would like to add a little Preamble that explains why this mail was sent to you all. I have talked to landof and bbraun on IRC because I believe that we should coordinate our efforts to expose Fink and Opendarwin to the world. I am, as you might know, rather well trained when it comes to promotion and press work. Fink has established a rather good turn around rate with the press contacts we have. The moment we post a significant announcement, it is most likely to be taken into account by the editors of large websites, which includes such key websites like macnn.com. To get us ressources I would like to do the following sooner or later:



a) Write a short article on Fink/Opendarwin
b) Ask those news sources if they are interested in helping us.
c) Actively going out, asking Universities and other Institutions to provide us with Mirror ressources.


This is a complex task and we have to tackle it together. This really is a case where I would like to point out, that we are stronger together.

Distfile mirroring
------------------

<snip>

The first, and most obvious problem for a distfile mirror is the
retrieval of the distfiles. This means the mirror must identify
the files needed to be mirrored, fetch them, and store them locally.
Identifying the files to be mirrored is much more difficult than it
sounds on the surface. Currently, we must have darwinports and fink
installed locally on the mirror, and they each download their distfiles
independently. I propose that neither project actually download their
distfiles internally, but rather inform the mirror of which files
they need downloaded. The reason will become clear in the next
paragraph.


As I have understood and I can only speak for Fink, we will sooner or later have three different type of files in Fink that we could mirror:

        a) the info text files
        b) The .deb files from a binary distribution
        c) The mirrored source tar.gz's for all info files in Fink.

Now if we used rsync or CVSupd it should be fairly easy to create a setup where it is very selective which files may be mirrored or not. There are of course minor issues but writing our own "proxy" in Perl should not be too hard. If that does not scale I am well willing to donate a few hours of C coding.
I will be the first to agree, that I do not understand the problem fully yet, so bare with me.


The next problem is making sure the mirror is accurate.  This means
that as many of the distfiles are present as possible, and there is
nothing there that should not be.  Over time, we do not want the
mirrors accumulating old versions of distfiles that are no longer
used.  This places an unnecissary burden on the mirrors.  Since
neither fink or darwinports (or any other project that would join
the project in the future) knows what distfiles the other project
requires, and the mirroring software does not download the distfiles
its self, there is no way to purge the unused distfiles.

I can only agree with bbraun here. We should leave the actual mirroring of the files involved with building a mirror to rsync or CVSupD or even straight CVS (which I would call deprecated in this setup). Rsync and CVSupD know very well how to handle the situation described above. Not to mention, that the software also ensures that all data has been transfered properly and there are not any faulty distfiles lying around due to software or network faults.



 This is
why I proposed that each project inform the mirror of what distfiles
it needs, and the mirror will then coalesce the distfiles, and
download each distfile only once.  Since the mirror now knows all
the distfiles that each project requires, it is capable of pruning
old distfiles that are no longer used.

Yes, the Master has to be the one machine that sets what is to be mirrored and in what fashion. With something like rsync it is as easy as deleting the file and having rsync run with --delete.

The mirror also should be almost entirely automated. Mirror operators
should not be required to do anything other than install the software.
Ever. This may not be entirely practical, but should be a goal to
be worked towards. This means that 1) the mirroring software should
be self sustaining, 2) the darwinports and fink software should be
able to handle future changes. Handling future changes can be addressed
in a number of ways, either by self updating as part of the mirroring
process, or by simply ensuring that future changes are backward
compatible. What I'd like to avoid here is mirror operators needing
to hack on code to get the mirroring working, having to run copies
of darwinports or fink or mirroring software that was hacked up by
someone else, or otherwise using unsupported features.


As I said I do not think we should mess around with such issues. We should concentrate on our system and how we can alter it, so that we can use services that are well established, such as rsync (which was written as an undergrate project on how to mirror remote files with the least amount of data transfered) or CVSupD. They are well established and most institutions have no qualms about installing them. They will or might be very hesitant about our software.

I envision only one site (a 'master' site) needing to download all
the distfiles from various vendors.

Agreed.


Once that site has downloaded
all the distfiles, other mirror sites can simply rsync the distfiles
over. This would reduce the cost of being a mirror, both in terms
of bandwidth utilized, and in setup cost. If a mirror simply need
run rsync from cron, that is much lower cost than installing darwinports,
fink, and mirroring software.
Once more agreed, I should have read on and not started rambling so soon ;)

However, just because there only needs to be one 'master' mirror,
doesn't mean there can't be others.  I would anticipate the process
for becoming a master mirror to be fairly painless and all the pieces
necissary would be available to anyone that wanted to set one up.
We will have to have at least two Master mirrors in completely different areas of the world. Not only because it allows us to evade script kiddies and other idiotic attacks, it will also ensure, that our system is as fault tolerant as possible.

There may also be benefits to having a single 'master' mirror. Some
mirroring features that have been discussed, such as mailing port/package
maintainers if md5's don't match distfiles, would be much more
convenient if there were only one 'master' mirror (port maintainers
would only get one copy of the mail, rather than n copies where n is
the number of 'master' mirrors).


So, to accomplish this, several things need to happen.
1) darwinports and fink need to be able to output the information
required by the mirroring software.
2) the mirroring software needs to be written.

What do you actually see as the features of that mirroring software? It will run only on the mirrors, is that correct?

So, now for the allocation of work to make this a reality.
I propose that the OpenDarwin project host the first 'master' mirror site.
OpenDarwin has bandwidth and disk that can be donated to the cause.
Other mirror sites may feel free to rsync the distfiles from the
OpenDarwin mirror.


Agreed. Fink does not have the ressources yet. I did talk to Apple and other big Vendors if they are interested in sponsoring us, but that kinda went to and through def ears.

I volunteer to write the mirroring software.
I am more than happy to have a look at it as well. I come from the C/asm world, but I am sure, that I could also help with Perl, I have been learning quite a bit now.

This should be relatively
easy, and I should be able to get it done and functioning quickly once
darwinports and fink are able to provide it with the necissary information.


Landon has already volunteered to make darwinports output the necissary
information.


I'd like to get someone from the fink project to volunteer to add this
functionality to fink. As I said before, I'd like to make sure whatever
changes are made get folded back into the mainline project.


Feanor, is that something you could look at? We will all have to talk to Max Horn (the main fink Maintainer) to get this into the mainstream. Though, since this is becoming crucial to a beneficial behaviour of Fink, it should be no problem at all.

More ideas ?

- -d


- - we may race and we may run, but we can not undo what has been done. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (Darwin)

iD8DBQE+eOTMiW/Ta/pxHPQRA5V1AJ9qgYfm1vAHn5DU7selmCvc1ByUPwCfQwDO
FgtZIEmHsZLrjxjNW3lRSaM=
=lbrK
-----END PGP SIGNATURE-----


- "Deep into that darkness peering, long I stood there wondering, fearing,
- Doubting, dreaming dreams no mortal ever dared to dream to dream before.." Edgar Allen Poe - The Raven




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Fink-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/fink-devel

Reply via email to