I remember playing with Sumatra several years ago. I believe the approach is to track all that metadata in a SQLite db and then make it browsable/accessible with a Django web app.
http://neuralensemble.org/sumatra/ In the R world many folks have taken to appending `sessionInfo()` or `devtools::session_info()` to the end of an Rmd file to track packages attached, etc. The latter also gives SHAs for packages installed from GitHub. Wouldn’t be that hard to also start including a shell chunk with `git rev-parse HEAD` to include the local repo commit info. Here’s the old discussion on this I remember from several years ago: https://github.com/swcarpentry/DEPRECATED-site/issues/1085 Best, Naupaka > On Aug 12, 2018, at 6:30 AM, Bruce Becker via discuss > <[email protected]> wrote: > > Hi Greg, all > I'm not sure about the Bronze Age, but in the Baroque era my understanding is > that this is the job of metadata. You need a lot of machinery to do this, but > in this era, data never lives "nakedly", but it always accompanied by > metadata which describes it. So, you look up data by it's persistent > identifier, in repositories, and deposit it, along with it's changelog or > whatever, in repositories. > > I am the first to concede that many, if not the vast majority of data > civilisations will ever reach the Baroque age - and perhaps others will skip > it altogether, but this happens to be the civilisation I'm writing to you > from. I'd hazard the suggestion that the Baroque Age is also known as the > Open Science age, just to be prickly. > > Have a great sunday! > Bruce > >> On Sun, 12 Aug 2018 at 15:15, Greg Wilson <[email protected]> wrote: >> Hi, >> >> Back in the Stone Age, Software Carpentry's lessons spent a few minutes >> discussing data provenance: >> >> - Include the string '$Id:$' in every source code file - Subversion >> would automatically fill in the revision ID on every commit to turn it >> into something like '$Id: 12345'. >> >> - Print the script's name, the commit ID, and the date in the header of >> every output file (along with all the parameters used by the script). >> >> It wasn't much, and I don't know how many people ever actually >> implemented it, but it did allow you to keep track of which versions of >> which scripts had generated which output files in a systematic way. >> >> So here we are today in what I hope is research computing's Bronze Age, >> and I'm curious: what do you all actually do to keep track of data >> provenance? What tools or methods do you use to record which programs >> produced which output files from which input files with which settings >> and parameters? I was excited about the Open Provenance effort circa >> 2006-07 (https://openprovenance.org/opm/), but it never seemed to catch >> on. What are people using instead? >> >> Thanks, >> >> Greg >> >> -- >> If you cannot be brave – and it is often hard to be brave – be kind. >> > > The Carpentries / discuss / see discussions + participants + delivery options > Permalink ------------------------------------------ The Carpentries: discuss Permalink: https://carpentries.topicbox.com/groups/discuss/Te1cade367c0ab4ee-Maa4849e5f43ef8009e5f87e0 Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription
