Hi, On ven., 16 févr. 2024 at 10:14, Timothy Sample <samp...@ngyro.com> wrote:
>> Can we consider that this report is now done? Because: >> >> 1. SWH supports ExtID and nar hash lookup. >> >> 2. Missing origins are currently ingested by SWH. >> (via specific sources.json) > > I think that would be jumping the gun a little bit. > > In some sense, the report is only *done* when “stored” hits 100% (or > close to it, with the remainder being stuff we are pretty sure no longer > exists). This won’t happen just because of your second point there. Just to be sure: we are speaking about Bioconductor only, right? > When the historical “sources.json” is loaded, things will be much, much, > better, sure. Sources will still be missing, though. Yeah, sources will still be missing but I expect that Bioconductor will be not. The only issue is about “annotation” and maybe “experiment”. However, here we are hitting the boundary between code and data: annotation and experiment might be very large and potentially skipped by SWH and they contain few if no code but plain data. We can still discuss what to do here; in this already long thread. :-) Or we can open another thread for this specific case about Bioconductor annotation and experiment. > To me, this is an > invitation to more subtle analysis, like weighing sources by their > “importance” in the package graph. Then there’s still shortcomings with > Disarchive that have to be resolved (which is work best guided by > numbers in the report). Yeah. But that seems a large scope than Bioconductor case, no? > Also, it will always be a good idea to verify that things are working. > Ideally this could be simpler (leveraging ExtID lookup) and continuous. Indeed, checking that all Bioconductor sources can be extracted from SWH+Disarchive seems the path forward closing this report. :-) Cheers, simon