Hello, I would like to begin some discussion about dpkg.
I have written some patches improving it, that it runs much faster (on my computer, new dpkg -s is over 700 times faster, it's not a mistake :)). But the changes make dpkg database backwards incompatible with current version, so I'll try to justify my decisions below. Please tell me what do you think about it. Oh, forgive me my awful English. :) I have the feeling that dpkg hasn't been designed to hold as many packages as there are in Debian today -- it keeps everything in large, non-indexed text files. It also lacks some useful features, eg. ability to make sophisticated queries to package database. And even simple ones take very much time (tens of seconds on slower machines). In this case a complete rewrite probably would be the best solution. But, as I found in Debian archives, there were people who wanted make dpkgv2 since 1999, and I there have been no results :) So I decided to find solutions to things that are the most annoying for me: - There should be no such thing as /var/lib/dpkg/available -- I think that dpkg shouldn't know anything about not installed packages. Higher-level package management tools, as apt, already have that information and it is enough. Moreover, parsing of available file unnecessarily takes a lot of time in most dpkg operations. Dselect also may use apt database instead of available file. - Parsing of /var/lib/dpkg/status also takes a lot of time, so there should be some better way of storing that information. As putting it into binary database might be controversial, I thought that splitting that file would be the best solution -- every installed package should have its own *.status file in /var/lib/dpkg/info directory. This makes recreating of original status file (for backward compatibility) very simple: just # cat /var/lib/dpkg/info/*.status >/var/lib/dpkg/status - As dpkg -S is very slow (I know there is dlocate, but it is only a workaround, not a real solution) there should be some binary database that holds information to which package every file belongs. It would be created from *.list files, so primary information would be still in good old text files. - There should be ability to make more complicated queries. I like grep-dctrl very much. I also think that many features may be taken from rpm... - As /var/lib/dpkg/info contains a lot of files (and if we add status files there it will contain even more), maybe there should be _possibility_ (but not _need_) to use one large indexed file that would hold its content. When I archived my info dir using ar, the output file took only half of the space that was occupied by directory. Maybe there should be also optional possibility of in-flight (de)compression of these files. There are servers, where space on filesystem is limited (eg. systems on flash), so that would gave Debian more flexibility. So here is what I have done until now. In short, I implemented features mentioned in first two points from the list above. Current version of patch may be obtained here: http://nh.pl/~michau/proj/dpkg/ It modifies dpkg and dselect, that they do the following: - dselect reads /var/lib/apt/lists/*Packages instead of /var/lib/dpkg/available - dpkg doesn't read or write /var/lib/dpkg/available anymore, - dpkg reads and writes /var/lib/dpkg/info/(package name).status files instead of /var/lib/dpkg/status. - Records about purged, not installed packages are considered not informative and aren't saved - New field, maybedirty, has been added to pkginfo structure -- only if it is set to 1 record is considered to be dumped to appropriate status file (so when we install some package, only one status file is written, not all). - Query code has been modified that it doesn't read whole package database anymore if it doesn't have to. - As there is no available database, dpkg -p runs apt-cache show, which should provide the same information. In result dpkg is much faster -- simple execution time comparisons (available at http://nh.pl/~michau/proj/dpkg/time.txt) show that patched dpkg may be about 6 time faster than original (while installing and uninstalling packages). In case of some queries, such as dpkg -s and dpkg -L patched version of dpkg is over 700 faster. What do you think about it? Is it the way dpkg should go? Yes, my code needs several improvements, I want to implement rest of the features I mentioned above, but I'd like to hear your opinions first... Cheers -- michau@ Oh no I've set too much / I haven't set enough I thought that I straced you sleeping / I thought that I straced you run I think I thought I saw core dumped [ R.A.M., "Loosing my revision" ]

