Hi,

Back in the Stone Age, Software Carpentry's lessons spent a few minutes discussing data provenance:

- Include the string '$Id:$' in every source code file - Subversion would automatically fill in the revision ID on every commit to turn it into something like '$Id: 12345'.

- Print the script's name, the commit ID, and the date in the header of every output file (along with all the parameters used by the script).

It wasn't much, and I don't know how many people ever actually implemented it, but it did allow you to keep track of which versions of which scripts had generated which output files in a systematic way.

So here we are today in what I hope is research computing's Bronze Age, and I'm curious: what do you all actually do to keep track of data provenance?  What tools or methods do you use to record which programs produced which output files from which input files with which settings and parameters?  I was excited about the Open Provenance effort circa 2006-07 (https://openprovenance.org/opm/), but it never seemed to catch on.  What are people using instead?

Thanks,

Greg

--
If you cannot be brave – and it is often hard to be brave – be kind.


------------------------------------------
The Carpentries: discuss
Permalink: 
https://carpentries.topicbox.com/groups/discuss/Te1cade367c0ab4ee-M703907d77763bffcdf143f1c
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription

Reply via email to