l...@gnu.org (Ludovic Courtès) writes: > Mark H Weaver <m...@netris.org> skribis: > >> I want to optimize it anyway, since it takes over 5 minutes to build >> my profile, which is a bit painful. > > Oh, this much? > > I have 140 packages in my profile and it takes less than 30s to build > it; that’s an SSD though, so that probably makes a big difference.
My profile has 155 packages. I've looked over union.scm and can see some extreme wastefulness, most notably in the use of 'others-have-it?'. I guess this typically does N 'lstat' calls for every file or directory in every package in the resulting profile that cannot be pruned (due to being in a directory that's only in one package), where N is the number of packages in the profile. I'm fairly sure it is possible to replace those N 'lstat' calls with something that requires 0 system calls and at most O(log N) time. The basic idea would be to iterate over all packages in a breadth-first manner, as follows: I think we should readdir the top-level directories of every package, and merge them together into a single map structure (vhash?) that maps filenames to sets of packages containing that filename. We don't even need to 'lstat' them at this point, all we need are the names. After we've read the directories of every package, we then iterate over the map. For any unique entries, we simply make a symlink in the new profile. For duplicates we'd do conflict resolution, which starts with an 'lstat'. For directories, the default conflict resolution would simply create a directory in the new profile and then recurse into that subdirectory, considering only the packages that contained that subdirectory. My guess is that this would speed up profile creation by at least an order of magnitude, maybe more. What do you think? Mark