I have been using and also contributed a bit to Nipype a few years ago, we
use there the W3C Prov Data Model:

https://nipype.readthedocs.io/en/latest/devel/provenance.html

http://prov.readthedocs.io/en/latest/index.html

https://www.w3.org/TR/prov-dm/

Cheers


On Mon, 13 Aug 2018, 04:19 Robert M. Flight via discuss, <
[email protected]> wrote:

> Hi Greg,
>
> I've been doing a bunch of work where we really want to track which
> version of things were used, and also the basic methods, and finally what
> input data was used.
>
> What I ended up implementing was a bunch of JSON metadata.
>
> At raw file copying, a JSON metadata file gets generated with the original
> location, new location, and the SHA256 checksum of the original data.
>
> After a transformation step, more metadata is extracted during processing,
> including which version of the software did the conversion. This gets added
> to the raw copying metadata.
>
> A final processing occurs using an custom R package, and for that package,
> I have a git hook that increments the package version on every commit, so
> every commit has a corresponding version number. Also, because it is a
> local custom pkg, if I have a clean git repo, the git SHA is added to the
> pkg metadata at install. During processing, I have a function that gets the
> parent pkg metadata, including the SHA if it exists, and adds it to another
> JSON metadata file. It also takes the main data class object that gets
> processed, strips the data bits, and writes a JSON representation of the
> main class (so all the methods are encoded as JSON), which also becomes
> part of the JSON metadata, in addition to saving a binary representation of
> the class with the data attached. All this is added to the previously
> existing metadata.
>
> Ideally, I would be capturing the version numbers for all of the pkg's
> that my code imports as well, but I haven't gone that far.
>
> Cheers,
>
> -Robert
>
> On Sun, Aug 12, 2018 at 7:40 PM Damien Irving via discuss <
> [email protected]> wrote:
>
>> Hi Greg,
>>
>> I've written a Data Carpentry lesson on data provenance, which makes use
>> of a very simple package I've written called cmdline-provenance:
>>
>>    - Lesson:
>>    https://data-lessons.github.io/python-aos-lesson/09-provenance/index.html
>>    - Package: http://cmdline-provenance.readthedocs.io/en/latest/
>>
>>
>> Cheers,
>> Damien
>>
>> On Sun, Aug 12, 2018 at 9:13 AM, Greg Wilson <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> Back in the Stone Age, Software Carpentry's lessons spent a few minutes
>>> discussing data provenance:
>>>
>>> - Include the string '$Id:$' in every source code file - Subversion
>>> would automatically fill in the revision ID on every commit to turn it into
>>> something like '$Id: 12345'.
>>>
>>> - Print the script's name, the commit ID, and the date in the header of
>>> every output file (along with all the parameters used by the script).
>>>
>>> It wasn't much, and I don't know how many people ever actually
>>> implemented it, but it did allow you to keep track of which versions of
>>> which scripts had generated which output files in a systematic way.
>>>
>>> So here we are today in what I hope is research computing's Bronze Age,
>>> and I'm curious: what do you all actually do to keep track of data
>>> provenance?  What tools or methods do you use to record which programs
>>> produced which output files from which input files with which settings and
>>> parameters?  I was excited about the Open Provenance effort circa 2006-07 (
>>> https://openprovenance.org/opm/), but it never seemed to catch on.
>>> What are people using instead?
>>>
>>> Thanks,
>>>
>>> Greg
>>>
>>> --
>>> If you cannot be brave – and it is often hard to be brave – be kind.
>>>
>>>
>>> ------------------------------------------
>>> The Carpentries: discuss
>>> Permalink:
>>> https://carpentries.topicbox.com/groups/discuss/Te1cade367c0ab4ee-M703907d77763bffcdf143f1c
>>> Delivery options:
>>> https://carpentries.topicbox.com/groups/discuss/subscription
>>>
>> *The Carpentries <https://carpentries.topicbox.com/latest>* / discuss /
> see discussions <https://carpentries.topicbox.com/groups/discuss> +
> participants <https://carpentries.topicbox.com/groups/discuss/members> + 
> delivery
> options <https://carpentries.topicbox.com/groups/discuss/subscription>
> Permalink
> <https://carpentries.topicbox.com/groups/discuss/Te1cade367c0ab4ee-M5aa3b948e50783c4ea461204>
>
-- 

Sent from my phone, sorry for brevity or typos.

------------------------------------------
The Carpentries: discuss
Permalink: 
https://carpentries.topicbox.com/groups/discuss/Te1cade367c0ab4ee-M04c05ccbb9e8c622234878cf
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription

Reply via email to