Hi folks,

in the recent months, the problems with Fink's dependency engine (and dpkg's, and the way they interact) have become more and more apparent. Various problems are basically impossible to overcome with the current design, hence it seems we need a new full fledged dependency engine.

For some time, I hoped we might be able to just reuse the apt engine (and the hope is still not completely gone), but as far as I can tell it can't cope with build time only dependencies (but there is the possibility to work around that, see below). If somebody is interested to look into this (that means you have to know C++, and ideally also the apt sources, or are willing to work into it), feel free to contact me.

So, for now, instead of charging ahead and trying to write a new dependency engine from scratch or trying to retrofit an existing one, I went to try to write down what our needs are. Then based on this, I started to develop ideas on how to realize these needs in actual code. I try to present all my ideas and findings in this email. That includes a list of problematic cases the engine needs to handle, as well as fundamental problems, and problems that are also affecting our current system. It'll be a long email, and maybe I should put it on a web page later, too.


Basic needs and issues
======================

First off: the engine needs to deal with 4 basic types of dependency:
* build conflicts
* build dependencies
* install conflicts
* install dependencies

Note that the current "Depends" field in fink in fact represent both a build time *and* a install time dependency; in a future fink package manager release, we will hence add new fields InstallDepends and InstallConflicts (names subject to change); this will increase the flexibility w/o adding any complexity.

Due to the way fink works, it's hard to use dpkg's engine properly; e.g. fink "knows" it'll install all requirements for a package, but dpkg will not know that unless we install them all at the same time. There are more many scenarios where install/remove fails because fink "knows more" than dpkg. The only way to overcome this is to use "--force-depends" at least for some operations. But if we do that, we *have* to handle all dependency issues manually (be it with a complete new engine or with an existing engine, it doesn't matter). Right now fink gets off cheap as it relies on the dpkg engine to remove conflicting packages automatically, or to refuse to remove a package which is still needed. If we override this, we have to do it ourselves, increasing the complexity.


Another trouble area are build time dependencies, and build time in general. Right now, fink installs build time dependencies if needed, but doesn't remove them later (which might or might not be the right approach, you can argue either way). We don't handle build conflicts at all. Also, there issues when users run multiple fink's at once (I do that frequently - no need to interrupt KDE building if I just want to quickly install figlet). Right now, this can easily lead to screw up. Just imagine openssh is building and the user removes the openssl package. Ouch, either the build fails, or gets messed because the openssh build now starts using Apple's openssl - just imagine if the version of openssl differ, then half of openssh is linked against openssl 0.9.7, the other against 0.9.6 - ouch).
dpkg doesn't (and can't) handle build time dependencies at all. Fink should do that, but right now it does a very poor job at it.
apt only handles them in a very limited way (for the apt-get source command), not sufficient for our needs.


Ideas & Solutions
=================
I developed various ideas on how to tackle the problems above and other problems I encountered while researching this. Note that I am still not finished yet with all this (one of the reason for me to finally write this down was to get my thoughts ordered, it's easy to get lost :-).
And basically, this assumes we use write own engine...

Note: when I say deps (=dependencies) in the following, I usually also mean conflicts, which are also a kind of (anti)dependencies. You can view the net of packages as a graph, the packages are nodes, the (anti)dependencies are directed edges. It's actually a bit more complicated, a package can consist of many version (PkgVersion in fink), and a dependency can specify version ranges. But the idea should be clear.


First, I tried to split down the problems in small units. This makes reasoning easier. For example, some of the problems we used to have in the (limited) fink dep code was caused by the fact that the "Depends" field actually has a blurred meaning and is *not* the same as the dpkg "Depends" - because it also covers build time. Realizing this and getting it straight helped a lot.

Continuing this, the difference between build & install time deps. Fink can either "install" or "build" or "build & install" a package. So it's natural to break that down, and if you do that, you end up with "build" objectives which have their dependencies, and almost completely independent "install" objectives, which have their own separate (but usually related) sets of dependencies.

That's a bit like the current approach the fink package manger uses to decide what to build: it has a queue of packages to build. If one runs "fink install foo", then Initially it only contains "foo" (i am lying here, but it boils down to that). Then Fink goes on to iterate over the dependencies of foo; any of them which are not installed get prepended to the queue. Then fink iterates over the queue, and builds/install anything that has its dependencies fulfilled; any missing dependencies are again prepended to the queue. etc.

While this works well in many cases, it has its limitations. The code is pretty complicated; a packages is *always* installed immediately after it was built (with the notable exception of splitoffs, which was made to fix certain problems, but the change introduced a bunch of different problems).

So the idea is to change the queue to instead contain "objectives", three kinds of them in fact:
* build foo-1.0-1
* install bar-2.1-2
* remove qux

The "remove" objective is new, and the operations are split. This is needed because we may need to install db3 as a build dependency, then remove it to be able to install db4 which another package may have as a build dependency (a case fink currently can't handle, and which is partly responsible for our libpng/libpng3 problem).

Now the idea is to have one component which generates a batch of these commands.
By doing that in advance, we can detect problems *before* they occur, and immediately prompt the user on what to do (or just bail out). E.g. we could ask them "to install foo, the following packages have to be remove: bar, ... Do you want to continue".
Furthermore, we can then insert an optimizer: it can group together installs / removes, or it can re-arrange the order of things; for example, it might first build 3 packages, and then install them all at once, instead of build-install-build-install-build-install. This fixes a big class of problems, and also causes some speedup. Of course the optimizer has to honor dependencies and all for this, but I think it's pretty much doable.

The last stage is to execute the commands necessary to fulfill the objectives list. No dep checking is needed at this point, though one would probably still do it, a) for sanity checking and b) because some other process/the user may have messed with the installation in a way that breaks operations (e.g. openssl was removed). In the latter case, fink could bail out or offer the user to fix the issue or whatever.
[of course we could also make fink use the same locking mechanism as dpkg, thus avoiding any concurent apt-get/dpkg/fink runs, but at least for me that would be an annoying limitation]


IMHO the above approach is much cleaner than the current one in Fink, and hopefully easier to extend/debug.

So far, I haven't actually mentioned how to deal with the dependencies. That's because i wanted to first present the framework for the whole thing, before getting to the dirty and important details.

Why dependency deciding is difficult
====================================

Life would be easy if a dependency would just say "install foo", and there was exactly one foo. However, foo may exist in 5 different versions; there might be packages that "Provides: foo". Furthermore, dependencies can be versioned ("Depends: foo (>= 1.0-1)") or can have alternatives ("foo | foo-ssl") or even combinations of this.
Also, existing packages may conflict with foo, so we may have to remove those, which may not be possible because other installed packages depend on them, which we then could remove, which may not be possible, .... etc. you get the idea :-) And that doesn't even take into account that we should ask the user for permission first.
Then, foo can conflict with install stuff (same problem as in the previous paragraph)
Next, foo of course also has its rat tail of dependencies (which may be versioned, too), etc.

This can lead to three cases:
1) we can find one version of "foo" that leads to no issues
2) we can find more than possibility
3) we can find none
Case 1 is nice of course. In case 2, we can just use one at random like apt-get does. But of course our current approach is nice: ask the user which to use (system-xfree86 or xfree86-rootless? system-tetex or tetex-texmf... you get the idea). Of course, that decision then should immediately be taken account before asking the user the next question (a well known problem is that you have to tell fink several times whether to use system-tetex or not, even though it could answer all of the following questions based on the first one - in fact it's wrong to offer the choice again).

Case 3 is the nasty one. But it can occur very easily, as the libpng/libpng3 case shows (or in the past, evolution building was complicated by this issue due to db31 and db3 conflicting).
Anyway, should this happen, we may still be able to operate. That's so because some things don't have to be installed permanently: build dependencies. We can (and should!) remove those again after a package was built. However, handling this cleverly isn't trivial either, but at least it's possible.
If even that doesn't help (because the user choose to install ghostscript while system-ghostscript is installed), fink could determine a set of packages to remove to make the install possible (in the example, system-ghostscript). In the past, we left that to dpkg, but we have to know about this, too, to avoid certain problems (and because it seems we have to call dpkg with --force-depends, though I'll try as hard as possible to avoid that in most cases. BTW note that apt-get does it, too, for similar reasons as I think we have to do it).


Now, this all still doesn't tell much about how to go about the actual dependency handling. But since this email is already very long, I will write my thoughts on that in a second mail. I'll also explain the chroot/fakeroot approach for package building and how it would help us in many many ways (at the cost of more time/disk space, though).
That said, second mail might be tomorrow, as it's later (1 AM) and I need sleep. Feel free to rip this one apart in the meantime (but keep in mind, a second part will follow, too).


Cheers,

Max


-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
Fink-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/fink-devel

Reply via email to