Hi,
Back in the Stone Age, Software Carpentry's lessons spent a few minutes
discussing data provenance:
- Include the string '$Id:$' in every source code file - Subversion
would automatically fill in the revision ID on every commit to turn it
into something like '$Id: 12345'.
- Print the script's name, the commit ID, and the date in the header of
every output file (along with all the parameters used by the script).
It wasn't much, and I don't know how many people ever actually
implemented it, but it did allow you to keep track of which versions of
which scripts had generated which output files in a systematic way.
So here we are today in what I hope is research computing's Bronze Age,
and I'm curious: what do you all actually do to keep track of data
provenance? What tools or methods do you use to record which programs
produced which output files from which input files with which settings
and parameters? I was excited about the Open Provenance effort circa
2006-07 (https://openprovenance.org/opm/), but it never seemed to catch
on. What are people using instead?
Thanks,
Greg
--
If you cannot be brave – and it is often hard to be brave – be kind.
------------------------------------------
The Carpentries: discuss
Permalink:
https://carpentries.topicbox.com/groups/discuss/Te1cade367c0ab4ee-M703907d77763bffcdf143f1c
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription