Hi Naupaka; thanks for your mail. I played with Sumatra a couple of
times as well, but it didn't stick - what I'm chasing now are things
people are actually using in small- to medium-sized projects. (The way
CERN and STScI handle metadata is cool, and I'm grateful for it, but it
doesn't scale down to what most of us do in the lab.) The sessionInfo()
trick is cool - what else are people using?
Thanks,
Greg
On 2018-08-12 9:55 AM, naupaka via discuss wrote:
I remember playing with Sumatra several years ago. I believe the
approach is to track all that metadata in a SQLite db and then make it
browsable/accessible with a Django web app.
http://neuralensemble.org/sumatra/
In the R world many folks have taken to appending `sessionInfo()` or
`devtools::session_info()` to the end of an Rmd file to track packages
attached, etc. The latter also gives SHAs for packages installed from
GitHub. Wouldn’t be that hard to also start including a shell chunk
with `git rev-parse HEAD` to include the local repo commit info.
Here’s the old discussion on this I remember from several years ago:
https://github.com/swcarpentry/DEPRECATED-site/issues/1085
Best,
Naupaka
On Aug 12, 2018, at 6:30 AM, Bruce Becker via discuss
<[email protected] <mailto:[email protected]>>
wrote:
Hi Greg, all
I'm not sure about the Bronze Age, but in the Baroque era my
understanding is that this is the job of metadata. You need a lot of
machinery to do this, but in this era, data never lives "nakedly",
but it always accompanied by metadata which describes it. So, you
look up data by it's persistent identifier, in repositories, and
deposit it, along with it's changelog or whatever, in repositories.
I am the first to concede that many, if not the vast majority of data
civilisations will ever reach the Baroque age - and perhaps others
will skip it altogether, but this happens to be the civilisation I'm
writing to you from. I'd hazard the suggestion that the Baroque Age
is also known as the Open Science age, just to be prickly.
Have a great sunday!
Bruce
On Sun, 12 Aug 2018 at 15:15, Greg Wilson <[email protected]
<mailto:[email protected]>> wrote:
Hi,
Back in the Stone Age, Software Carpentry's lessons spent a few
minutes
discussing data provenance:
- Include the string '$Id:$' in every source code file - Subversion
would automatically fill in the revision ID on every commit to
turn it
into something like '$Id: 12345'.
- Print the script's name, the commit ID, and the date in the
header of
every output file (along with all the parameters used by the script).
It wasn't much, and I don't know how many people ever actually
implemented it, but it did allow you to keep track of which
versions of
which scripts had generated which output files in a systematic way.
So here we are today in what I hope is research computing's
Bronze Age,
and I'm curious: what do you all actually do to keep track of data
provenance? What tools or methods do you use to record which
programs
produced which output files from which input files with which
settings
and parameters? I was excited about the Open Provenance effort
circa
2006-07 (https://openprovenance.org/opm/), but it never seemed to
catch
on. What are people using instead?
Thanks,
Greg
--
If you cannot be brave – and it is often hard to be brave – be kind.
------------------------------------------
The Carpentries: discuss
Permalink:
https://carpentries.topicbox.com/groups/discuss/Te1cade367c0ab4ee-M703907d77763bffcdf143f1c
Delivery options:
https://carpentries.topicbox.com/groups/discuss/subscription
*The Carpentries <https://carpentries.topicbox.com/latest>* / discuss
/ see discussions <https://carpentries.topicbox.com/groups/discuss> +
participants <https://carpentries.topicbox.com/groups/discuss/members>
+ delivery options
<https://carpentries.topicbox.com/groups/discuss/subscription>
Permalink
<https://carpentries.topicbox.com/groups/discuss/Te1cade367c0ab4ee-Maa4849e5f43ef8009e5f87e0>
--
If you cannot be brave – and it is often hard to be brave – be kind.
------------------------------------------
The Carpentries: discuss
Permalink:
https://carpentries.topicbox.com/groups/discuss/Te1cade367c0ab4ee-M64f6d20b399ac7970f99a297
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription