I have been using and also contributed a bit to Nipype a few years ago, we use there the W3C Prov Data Model:
https://nipype.readthedocs.io/en/latest/devel/provenance.html http://prov.readthedocs.io/en/latest/index.html https://www.w3.org/TR/prov-dm/ Cheers On Mon, 13 Aug 2018, 04:19 Robert M. Flight via discuss, < [email protected]> wrote: > Hi Greg, > > I've been doing a bunch of work where we really want to track which > version of things were used, and also the basic methods, and finally what > input data was used. > > What I ended up implementing was a bunch of JSON metadata. > > At raw file copying, a JSON metadata file gets generated with the original > location, new location, and the SHA256 checksum of the original data. > > After a transformation step, more metadata is extracted during processing, > including which version of the software did the conversion. This gets added > to the raw copying metadata. > > A final processing occurs using an custom R package, and for that package, > I have a git hook that increments the package version on every commit, so > every commit has a corresponding version number. Also, because it is a > local custom pkg, if I have a clean git repo, the git SHA is added to the > pkg metadata at install. During processing, I have a function that gets the > parent pkg metadata, including the SHA if it exists, and adds it to another > JSON metadata file. It also takes the main data class object that gets > processed, strips the data bits, and writes a JSON representation of the > main class (so all the methods are encoded as JSON), which also becomes > part of the JSON metadata, in addition to saving a binary representation of > the class with the data attached. All this is added to the previously > existing metadata. > > Ideally, I would be capturing the version numbers for all of the pkg's > that my code imports as well, but I haven't gone that far. > > Cheers, > > -Robert > > On Sun, Aug 12, 2018 at 7:40 PM Damien Irving via discuss < > [email protected]> wrote: > >> Hi Greg, >> >> I've written a Data Carpentry lesson on data provenance, which makes use >> of a very simple package I've written called cmdline-provenance: >> >> - Lesson: >> https://data-lessons.github.io/python-aos-lesson/09-provenance/index.html >> - Package: http://cmdline-provenance.readthedocs.io/en/latest/ >> >> >> Cheers, >> Damien >> >> On Sun, Aug 12, 2018 at 9:13 AM, Greg Wilson <[email protected]> >> wrote: >> >>> Hi, >>> >>> Back in the Stone Age, Software Carpentry's lessons spent a few minutes >>> discussing data provenance: >>> >>> - Include the string '$Id:$' in every source code file - Subversion >>> would automatically fill in the revision ID on every commit to turn it into >>> something like '$Id: 12345'. >>> >>> - Print the script's name, the commit ID, and the date in the header of >>> every output file (along with all the parameters used by the script). >>> >>> It wasn't much, and I don't know how many people ever actually >>> implemented it, but it did allow you to keep track of which versions of >>> which scripts had generated which output files in a systematic way. >>> >>> So here we are today in what I hope is research computing's Bronze Age, >>> and I'm curious: what do you all actually do to keep track of data >>> provenance? What tools or methods do you use to record which programs >>> produced which output files from which input files with which settings and >>> parameters? I was excited about the Open Provenance effort circa 2006-07 ( >>> https://openprovenance.org/opm/), but it never seemed to catch on. >>> What are people using instead? >>> >>> Thanks, >>> >>> Greg >>> >>> -- >>> If you cannot be brave – and it is often hard to be brave – be kind. >>> >>> >>> ------------------------------------------ >>> The Carpentries: discuss >>> Permalink: >>> https://carpentries.topicbox.com/groups/discuss/Te1cade367c0ab4ee-M703907d77763bffcdf143f1c >>> Delivery options: >>> https://carpentries.topicbox.com/groups/discuss/subscription >>> >> *The Carpentries <https://carpentries.topicbox.com/latest>* / discuss / > see discussions <https://carpentries.topicbox.com/groups/discuss> + > participants <https://carpentries.topicbox.com/groups/discuss/members> + > delivery > options <https://carpentries.topicbox.com/groups/discuss/subscription> > Permalink > <https://carpentries.topicbox.com/groups/discuss/Te1cade367c0ab4ee-M5aa3b948e50783c4ea461204> > -- Sent from my phone, sorry for brevity or typos. ------------------------------------------ The Carpentries: discuss Permalink: https://carpentries.topicbox.com/groups/discuss/Te1cade367c0ab4ee-M04c05ccbb9e8c622234878cf Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription
