Hi, zimoun <zimon.touto...@gmail.com> skribis:
> On Fri, 29 Oct 2021 at 23:57, Ludovic Courtès <l...@gnu.org> wrote: [...] >> --8<---------------cut here---------------start------------->8--- >> $ SAMPLE_SIZE=200 ./pre-inst-env guile >> ~/src/guix-debugging/importer-accuracy.scm >> […] >> Accuracy for 'pypi' (200 packages): >> accurate: 58 (29%) >> different inputs: 142 (71%) >> different source: 0 (0%) >> inconclusive: 0 (0%) >> Accuracy for 'cran' (200 packages): >> accurate: 176 (88%) >> different inputs: 23 (12%) >> different source: 1 (0%) >> inconclusive: 0 (0%) >> --8<---------------cut here---------------end--------------->8--- > > [...] > >> The script doesn’t do anything useful for crates because they have their >> own way of representing inputs. It doesn’t account for changes in >> ‘arguments’ like zimoun suggested, meaning it’s overestimating >> accuracy. > > It is already quite interesting results. Because it shows upstream > stability, IIUC. Well, it means that running “guix import pypi” one > months ago and running the sames now, 71% packages have different > inputs. Right? It is because some metadata from PyPI changed, right? No no; I’m assuming PyPI, CRAN, etc. provide the same info as they did back when the package was imported (which is probably the case). > Not because “guix import pypi” was doing wrong and now it does better, > right? I’m also assuming that the importer didn’t change significantly in the meantime, which is probably a good approximation. What I think those figures show is the amount of manual tweaks necessary to get a proper package “à la Guix”, with tests running etc. For PyPI we often need to add things under ‘native-inputs’, hence the 71% “different inputs” line. For CRAN that’s sometimes necessary, but much less frequently. There are also cases with non-R/non-Python dependencies. > IMHO, it shows how PyPI allows bad practises about packaging, isn’t it? > > My understanding of this experiment is about upstream “quality”, not > about importer “accuracy”. Do I incorrectly understand? Yes, in a way, assuming our importers are not lossy, this tells us whether the upstream repo contains enough information and/or whether that information is accurate. Thanks, Ludo’.