Hi Colin, Quoting Colin Watson (2022-05-19 01:36:37) > On Sun, May 15, 2022 at 12:20:57AM +0200, Johannes Schauer Marin Rodrigues > wrote: > > I now have a patch (attached). > > Thanks for your patch! > > I found a few problems with it. Don't worry about sending a new patch > to address these; I have fixes in my local tree for most things I've > commented on here and am working on the rest. I'm just letting you know > where things stand.
thank you!! :) > First, we should add the appropriate Gnulib modules for upstream portability. > Then: > > > @@ -356,8 +356,8 @@ static void add_dir_entries (MYDBM_FILE > > * or . files (such as current, parent dir). > > */ > > > > - dir = opendir (infile); > > - if (!dir) { > > + n = scandir(infile, &namelist, NULL, alphasort); > > IMO we might as well move the filtering currently being done in the > while loop below to a scandir filter function. > > I would prefer not to use alphasort here, because it's locale-dependent > (not that it will matter very much in practice with the sorts of file > names that typically appear in manual page directories, but I can > imagine edge cases). A variant that's deliberately locale-independent > is only a few lines of code. That makes sense. Thank you. > > @@ -367,13 +367,13 @@ static void add_dir_entries (MYDBM_FILE > > > > /* strlen(newdir->d_name) could be replaced by newdir->d_reclen */ > > > > - while ((newdir = readdir (dir)) != NULL) { > > - if (*newdir->d_name == '.' && > > - strlen (newdir->d_name) < (size_t) 3) > > + while (n--) { > > I guess this might have been borrowed from the example in the scandir(3) > manual page, because it goes in reverse order; I don't think there's a > good reason to do that. In fact, my patch is borrowed from existing patches that the reproducible builds team submitted to packages and... > > + free(namelist); > > This leaks memory. We need to free all the elements of this list as > well. ...all these patches are missing a free() of the individual list members, so they all leak memory. Thanks a lot for spotting this -- I guess now I can prepare a few more patches for other packages. This would not've happened if I indeed had read the scandir(3) manual page more carefully, which does free() correctly. XD > > order_files (infile, &names); > > This means that the conversion to scandir in this function is in > practice going to be ineffective, because we immediately turn around and > re-sort the list by the physical locations of the first blocks of the > corresponding files. Won't this have just as much of an effect on > reproducibility in principle, even if it doesn't happen to affect > mmdebstrap in your tests, presumably due to something like disk order > typically being similar between runs if your disk isn't too full? This is interesting. I've tested my patch by executing man-db on a filesystem mounted with disorderfs --shuffle-dirents=yes and observed that using scandir() was somehow necessary in both locations. > I'm experimenting with simply removing the order_files call here, on the > basis that other performance improvements to mandb(8) have made it less > critical. It does seem to slow things down slightly even on an SSD, though > that may be measurement error; I still need to compare timings on a > rotational drive (but I will). Thanks! > More interestingly, it changes accessdb(8) output in ways that aren't just > obvious consequences of sorting (multi keys change due to that, but as far as > I can tell that's fine). So far I've spotted the following issues: > > * A number of new entries are introduced for some reason (and the > question is why they were missing beforehand). > > * The targets of WHATIS_MAN references change (admittedly in cases > where there are multiple possibilities, but we should be picking a > deterministic one rather than just the first). > > * Symlinks flip between ULT_MAN and SO_MAN depending on whether the > symlink happens to sort before its target. > > * The database's idea of whether pages require processing via tbl(1) > (and probably other preprocessors) changes in some cases. (Fixed in > https://gitlab.com/cjwatson/man-db/-/commit/1873051fdb.) > > I think these are mostly existing bugs, but I intend to fix them first > before attempting to land your patch, because this is all delicate > enough that I really want to be sure of exactly what's changing. (Since > these are all order-dependent bugs, it may even be that fixing all these > will make parts of your patch unnecessary; but we'll see.) > > I will keep working on this, and expect to be able to get a new release out > in a week or two. Thanks a lot for working on this! Once you think you are done, feel free to ping me in case you'd like me to run my test suite with man-db including all your changes, making sure that they have the desired effect in the environment I'm running it in. Currently, my workaround is to create a chroot in a TMPDIR mounted as tmpfs. It seems that the tmpfs directory entries are somehow returned deterministically even across different tmpfs mounts. Thanks! cheers, josch
signature.asc
Description: signature