[discuss] tracking data provenance

Greg Wilson Sun, 12 Aug 2018 06:15:11 -0700

Hi,

Back in the Stone Age, Software Carpentry's lessons spent a few minutesdiscussing data provenance:

- Include the string '$Id:$' in every source code file - Subversionwould automatically fill in the revision ID on every commit to turn itinto something like '$Id: 12345'.

- Print the script's name, the commit ID, and the date in the header ofevery output file (along with all the parameters used by the script).

It wasn't much, and I don't know how many people ever actuallyimplemented it, but it did allow you to keep track of which versions ofwhich scripts had generated which output files in a systematic way.

So here we are today in what I hope is research computing's Bronze Age,and I'm curious: what do you all actually do to keep track of dataprovenance? What tools or methods do you use to record which programsproduced which output files from which input files with which settingsand parameters? I was excited about the Open Provenance effort circa2006-07 (https://openprovenance.org/opm/), but it never seemed to catchon. What are people using instead?


Thanks,

Greg

--
If you cannot be brave – and it is often hard to be brave – be kind.


------------------------------------------
The Carpentries: discuss
Permalink: 
https://carpentries.topicbox.com/groups/discuss/Te1cade367c0ab4ee-M703907d77763bffcdf143f1c
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription

[discuss] tracking data provenance

Reply via email to