Hi Naupaka; thanks for your mail.  I played with Sumatra a couple of times as well, but it didn't stick - what I'm chasing now are things people are actually using in small- to medium-sized projects.  (The way CERN and STScI handle metadata is cool, and I'm grateful for it, but it doesn't scale down to what most of us do in the lab.)  The sessionInfo() trick is cool - what else are people using?

Thanks,

Greg


On 2018-08-12 9:55 AM, naupaka via discuss wrote:
I remember playing with Sumatra several years ago. I believe the approach is to track all that metadata in a SQLite db and then make it browsable/accessible with a Django web app.

http://neuralensemble.org/sumatra/

In the R world many folks have taken to appending `sessionInfo()` or `devtools::session_info()` to the end of an Rmd file to track packages attached, etc. The latter also gives SHAs for packages installed from GitHub. Wouldn’t be that hard to also start including a shell chunk with `git rev-parse HEAD` to include the local repo commit info.

Here’s the old discussion on this I remember from several years ago:
https://github.com/swcarpentry/DEPRECATED-site/issues/1085

Best,
Naupaka

On Aug 12, 2018, at 6:30 AM, Bruce Becker via discuss <[email protected] <mailto:[email protected]>> wrote:

Hi Greg, all
I'm not sure about the Bronze Age, but in the Baroque era my understanding is that this is the job of metadata. You need a lot of machinery to do this, but in this era, data never lives "nakedly", but it always accompanied by metadata which describes it. So, you look up data by it's persistent identifier, in repositories, and deposit it, along with it's changelog or whatever, in repositories.

I am the first to concede that many, if not the vast majority of data civilisations will ever reach the Baroque age - and perhaps others will skip it altogether, but this happens to be the civilisation I'm writing to you from. I'd hazard the suggestion that the Baroque Age is also known as the Open Science age, just to be prickly.

Have a great sunday!
Bruce

On Sun, 12 Aug 2018 at 15:15, Greg Wilson <[email protected] <mailto:[email protected]>> wrote:

    Hi,

    Back in the Stone Age, Software Carpentry's lessons spent a few
    minutes
    discussing data provenance:

    - Include the string '$Id:$' in every source code file - Subversion
    would automatically fill in the revision ID on every commit to
    turn it
    into something like '$Id: 12345'.

    - Print the script's name, the commit ID, and the date in the
    header of
    every output file (along with all the parameters used by the script).

    It wasn't much, and I don't know how many people ever actually
    implemented it, but it did allow you to keep track of which
    versions of
    which scripts had generated which output files in a systematic way.

    So here we are today in what I hope is research computing's
    Bronze Age,
    and I'm curious: what do you all actually do to keep track of data
    provenance?  What tools or methods do you use to record which
    programs
    produced which output files from which input files with which
    settings
    and parameters?  I was excited about the Open Provenance effort
    circa
    2006-07 (https://openprovenance.org/opm/), but it never seemed to
    catch
    on.  What are people using instead?

    Thanks,

    Greg

-- If you cannot be brave – and it is often hard to be brave – be kind.


    ------------------------------------------
    The Carpentries: discuss
    Permalink:
    
https://carpentries.topicbox.com/groups/discuss/Te1cade367c0ab4ee-M703907d77763bffcdf143f1c
    Delivery options:
    https://carpentries.topicbox.com/groups/discuss/subscription

*The Carpentries <https://carpentries.topicbox.com/latest>* / discuss / see discussions <https://carpentries.topicbox.com/groups/discuss> + participants <https://carpentries.topicbox.com/groups/discuss/members> + delivery options <https://carpentries.topicbox.com/groups/discuss/subscription> Permalink <https://carpentries.topicbox.com/groups/discuss/Te1cade367c0ab4ee-Maa4849e5f43ef8009e5f87e0>

--
If you cannot be brave – and it is often hard to be brave – be kind.


------------------------------------------
The Carpentries: discuss
Permalink: 
https://carpentries.topicbox.com/groups/discuss/Te1cade367c0ab4ee-M64f6d20b399ac7970f99a297
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription

Reply via email to