Thanks Sean, This looks awesome! Many thanks for storing this. I'll see how I could process the data and might contact you off-list or via the issues on the repo.
Just by the numbers reported I'm a bit surprised by the daily increment of the summary table. Bioconductor software has around 2000 packages, checked on 5 different machines, per 5 outputs (Install, build, check, bin, propagate) (which results on that order of magnitudes), but not all builds and checks are run everyday (now I cannot find the page where the frequency is reported). At the moment I won't use build and check reports but I might be interested in that later (I too collect general checks results from CRAN without the log files). In any case, I'll get in touch. Ideally, I would like to export/use this from a package, as I have done for CRAN via the repo.data package I'm building. Best wishes and many thanks, Lluís On Thu, 20 Mar 2025 at 02:56, Sean Davis <seand...@gmail.com> wrote: > > Hi, all. > > > > Perhaps a bit tangential, but I capture the results of all build reports for > all packages daily (that is the intent, anyway) going back a year or so (a > couple of years if we dig into archives). The reports are processed using > code in this repo: https://github.com/seandavi/BiocBuildDB using a github > action that runs daily. This might not be exactly the format you are looking > for, Lluis, but it does have a complete history of every build for every > package for every day for all Bioc builds. > > > > The result is a set of three CSV files (one set for every build, about 3.5k > CSV files right now) with rows for each package/machine/build step and the > results of the build, including propagation status (whether the package gets > pushed to release). Version numbers, git hashes, dates, Bioconductor > versions, build commands, error logs, etc. are all captured. Thus, things > like full text search over captured log output is possible over time, across > branches, and across machines or packages. When a package enters the system > is captured. The build_summary table currently checks in at about 6M rows > (again, without going into archive data) and adds about 20k rows per day. > > > > I have pending issues to expose the data but just haven’t prioritized the > work. I’m happy to discuss access and use cases either in a new thread here, > on Slack, or via github issues. > > > > Sean > > > > > > > > From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of Lluís > Revilla <lluis.revi...@gmail.com> > Date: Wednesday, March 19, 2025 at 6:21 PM > To: Kern, Lori <lori.sheph...@roswellpark.org> > Cc: bioc-devel <bioc-devel@r-project.org> > Subject: Re: [Bioc-devel] Bioconductor archive? > > Hi Lori, > > Many thanks for your answer. I have a couple of follow-up questions. > > > It looks like the Date/Publication field is only present when there was a > > change on the branch post release. (ie. any package that has a version > > x.y.(z+n) instead of x.y.0. > > After a release is frozen and a new release occurs, Bioconductor does not > > allow any changes or fixes even to bugs. A release is frozen so there is > > no changes after the new release occurs. > > Thanks for reminding me of this. I'm interested on the x.y.z+n > packages that were released on each release, not just the last one or > the initial one. Is this historical information available? The file at > https://bioconductor.org/packages/3.20/bioc/VIEWS only includes the > latest date of a given release, but there could be a release within a > given Bioconductor version before that. > > > I would have to dig in the history but my guess is 3.7 might be when we > > either switched to git or started having archived versions so likely not > > available before this date. > > I thought it would be difficult if not impossible to check this but > even for the current release I can't find this data. Does Bioconductor > have an internal archive with this information? On CRAN even if it > removes a package internally the activities of the archive are > stored: each date-time of publication, archive and removal. Does > something similar happen in Bioconductor? Even if a given package is > not available knowing that there was a release could be helpful for > reproducibility (as it could be used to compare with the git log). > > With that information finding which package versions were used for a > script with only a date could become easier. > > Best, > > Lluís > > > > > > > > > > Lori Shepherd - Kern > > > > Bioconductor Core Team > > > > Roswell Park Comprehensive Cancer Center > > > > Department of Biostatistics & Bioinformatics > > > > Elm & Carlton Streets > > > > Buffalo, New York 14263 > > > > ________________________________ > > From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of Lluís > > Revilla <lluis.revi...@gmail.com> > > Sent: Saturday, March 15, 2025 5:20 AM > > To: bioc-devel <bioc-devel@r-project.org> > > Subject: [Bioc-devel] Bioconductor archive? > > > > Hi, > > > > Recently I learned thanks to Martin Morgan that there are some files with > > the Date/Publication fields for Bioconductor packages: > > https://secure-web.cisco.com/1WmVHwH9-fASq-_cRqjzutLif_scf2tV0oia7j9wcAlmEkD6LTfPr4hpDabt4CAjYBdFcUrtqQXG2zbH0HakIsmTnqgnHUbghB0qC_b3FyGAhL5dnDBbz1Oh7HlpVwyPV79vgW7FMsg__zeInCyPb_jmFBXAvFRuq-HsBLTAC-Bf2EfgTjG3y38kBOIGnb59DWA6ILkuC-oYK0RJe8h3JvV5RoaeA9FxDk6QokHUT-YeC7hIEd_hURH1dV0dKbJN717qRcgwyT42SNb1evj91AQrxGnEyIR2XFpm28A-qOih3N2V_YsWsZd0wzGApXcZy/https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.7%2Fbioc%2FVIEWS > > I'm trying to reconstruct > > which packages from CRAN and Biocondctor were available at any moment and > > it was very helpful. > > > > However, these files have the latest version published by a package on a > > given Bioconductor release. > > Is there a way to know if there were more updates after a release? > > I thought about searching the git log for each package. But that wouldn't > > be enough, as they might have increased their version but not passed > > Bioconductor checks, and thus not be released. > > > > Related to this, this field is present from Bioconductor version 3.7 or > > later but I couldn't find it on previous releases. Is there a way to know > > previous packages' releases and their dates? > > > > Packages' updates on the release branch should on contain bug fixes, but > > for reproducibility purposes it might be necessary to get the same bugs > > again. > > > > Many thanks in advance, > > > > Lluís > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioc-devel@r-project.org mailing list > > https://secure-web.cisco.com/13SnGNaaDyFbctEb1TdAguAxRDGWtUJvQINgKyoWwg8r1Kce77xQNycHZxQSYbLF7m6L2z5y7dVIwm3y-9U1nxiyuzrQxuIQZc5HoTMPvbokKA1qJHn3CCb-Zlx3gtXWIW2VtFh_7loh_SYeLpi5ak38PFBFkLutgGFEwFhXbr0EFIo2W8HRtaqFNH9_U-hcBauAVzEJOJV9rFuxZom3twTGLLjMzaXn7ZhRdcG56Z_sAM0lzgdFeTgepY4mN7XAUwqNMoSSwjIeL10YspawZ6fy_yXLfIysgSN1DpVVdzc9Pv7GHlPjj7-EVYr-ScNbg/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-devel > > > > > > This email message may contain legally privileged and/or confidential > > information. If you are not the intended recipient(s), or the employee or > > agent responsible for the delivery of this message to the intended > > recipient(s), you are hereby notified that any disclosure, copying, > > distribution, or use of this email message is prohibited. If you have > > received this message in error, please notify the sender immediately by > > e-mail and delete this email message from your computer. Thank you. > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel