Re: [discuss] reproducible science platforms (especially for economics)?

Mike Henry Mon, 06 Aug 2018 12:29:14 -0700

While not particular to the OP's needs, for reproducibility in HPC
workflows I recommend Singularity
<https://github.com/singularityware/singularity#singularity---enabling-users-to-have-full-control-of-their-environment>
which supports technologies like MPI and Nvidia GPUs. Singularity supports
docker images so if you have invested some tooling and time with docker, it
will not be wasted. Singularity is pretty well supported on XSEDE resources
and our local sysadmin didn't have any issues getting it working on our
local cluster.


On Sat, Aug 4, 2018 at 2:11 PM, Jane Wyngaard <[email protected]> wrote:

> This project is still underdevelopment but is open for beta use and is
> intended for exactly what the OP' requested
> http://wholetale.org/
>
> Basically jypyter notebooks in docker containers that can tie your data
> directly in from a remote host.  So you get provenance of your code and
> data and a easily accessible re-run cloud accessible point.   They're most
> well integrated with DataOne and Globus for data hosting or you can upload
> your data to the container manually if it's not too big.
>
> That said, I'll also through in a vote for OpenScienceFrameworks for tying
> together code repos, and project documentation, and some data hosting
> platforms.
>
> Depends on needs
>
> jane
>
> On 3 August 2018 at 16:50, Sebastian Schmeier via discuss <
> [email protected]> wrote:
>
>> A bit of a different domain but I wrote a tutorial for reproducible
>> research in bioinformatics using conda package management, snakemake for
>> workflow management and containerization using Singularity here:
>> http://reproducible.sschmeier.com/
>>
>> Cheers
>>
>>
>>    Sebastian Schmeier
>>
>>
>> ~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~
>>
>>  Dr. Sebastian Schmeier
>>
>>  Research Group Leader
>>
>>  Senior Lecturer in Bioinformatics/Genomics
>>
>>  Institute of Natural and Mathematical Sciences
>>
>>  Massey University Auckland
>>
>>  [email protected] | https://sschmeier.com
>>
>> ~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~
>>
>>
>> On Sat, 4 Aug 2018, 06:41 C. Titus Brown, <[email protected]> wrote:
>>
>>> Hi Sumana,
>>>
>>> I’m a big fan of mybinder.org, and the software stack it’s based on
>>> (BinderHub and in particular repo2docker).
>>>
>>> repo2docker takes a repo with a rather standard & flexible
>>> “configuration spec” and builds a Docker image out of it.
>>>
>>> mybinder.org, which is a free service running software called
>>> BinderHub, stage manages the process of getting a GitHub repo name,
>>> building the docker image, and then launching one or more containers of
>>> that image that run either Jupyter or RStudio (or potentially any other Web
>>> site).
>>>
>>> repo2docker config info: https://repo2docker.readthedoc
>>> s.io/en/latest/config_files.html
>>>
>>> mybinder.org: https://mybinder.org/
>>>
>>> binderhub: https://github.com/jupyterhub/binderhub
>>>
>>> There’s a collection of binder examples here,
>>>
>>> https://github.com/binder-examples
>>>
>>> and I would suggest taking a look at either
>>> https://github.com/binder-examples/jupyterlab or
>>> https://github.com/binder-examples/r - just go click the little “launch
>>> binder” button on the README!
>>>
>>> All open source etc etc of course.
>>>
>>> best,
>>> —titus
>>>
>>> > On Aug 1, 2018, at 9:39 PM, Sumana Harihareswara <[email protected]>
>>> wrote:
>>> >
>>> > Friends and neighbors: what platforms for reproducible science
>>> (including scientific computing) do you recommend? As in, "in order for you
>>> to verify my results, you can go to this webpage/repository/etc. and
>>> download the data I used and the code I wrote, and run the same
>>> models/experiments to verify and reproduce my findings"? And is there an
>>> existing platform and site that economists in particular gravitate toward,
>>> and does it make a difference if the language in question is Python?
>>> >
>>> > I'm helping a client who wants to avoid reinventing the wheel. I
>>> include a note about them & their current approach at the bottom of this
>>> email.
>>> >
>>> > There seem to be many different software projects and archives I
>>> should explore, such as:
>>> >
>>> > * LabTrove
>>> > http://www.labtrove.org/aboutus/ (example:
>>> http://malaria.ourexperiment.org/
>>> >  )
>>> > * Dryad
>>> > https://datadryad.org/
>>> >
>>> > * Open Science Framework
>>> > https://osf.io/
>>> >
>>> > * figshare
>>> > https://figshare.com/
>>> >
>>> > * RunMyCode
>>> > http://www.runmycode.org/
>>> >
>>> > * DAT
>>> > https://datproject.org/
>>> >
>>> > * finding a particular existing Dataverse or VisTrails instance?
>>> > https://dataverse.org/ https://nyu.reproduciblescience.org/vistrails/
>>> >
>>> > * ScienceFair
>>> > http://sciencefair-app.com/
>>> >  maybe?
>>> > * Stencila
>>> > https://stenci.la/
>>> >  maybe?
>>> > * use GitHub plus Jupyter notebooks or something like ReproZip
>>> > https://www.reprozip.org/
>>> >
>>> >
>>> >
>>> > Sorry if I'm lumping together things that are quite different from
>>> each other! I'm at a bit of a loss here and may have missed a foundational
>>> explanation/directory.
>>> >
>>> > My client's currently got a standalone GitHub repository:
>>> > https://github.com/econ-ark/REMARK
>>> >  . I'll excerpt from their README to explain:
>>> >
>>> >
>>> >
>>> >> This is the resting place for self-contained and complete projects
>>> written using [our tools].
>>> >>
>>> >> Each of these resides in its own subdirectory in the REMARKs directory
>>> >>
>>> >> Types of content include (see below for elaboration):
>>> >>
>>> >>     Explorations
>>> >>         Use the Econ-ARK/HARK toolkit to demonstrate some set of
>>> modeling ideas
>>> >>     Replications
>>> >>         Attempts to replicate the results of published papers written
>>> using other tools
>>> >>     Reproductions
>>> >>         Code that reproduces the results of some paper that was
>>> originally written using the toolkit
>>> >>
>>> >>
>>> > ...
>>> >
>>> >
>>> >> Code archives should contain:
>>> >>
>>> >>     All information required to get the replication code to run
>>> >>     An indication of how long that takes on some particular machine
>>> >>
>>> >> Jupyter notebook(s) should:
>>> >>
>>> >>     Explain their own content ("This notebook uses the associated
>>> replication archive to demonstrate three central results from the paper of
>>> [original author]: The consumption function and the distribution of wealth")
>>> >>     Be usable for someone wanting to explore the replication
>>> interactively (so, no cell should take more than a minute or two to execute
>>> on a laptop)
>>> >>
>>> >>
>>> >
>>> > Much thanks. I would be happy to hear, for instance, "use this" or "it
>>> depends very heavily on your needs, but DON'T use these because they're
>>> vaporware/super-buggy".
>>> >
>>> >
>>> >
>>> > --
>>> > Sumana Harihareswara
>>> > Changeset Consulting
>>> >
>>> > https://changeset.nyc
>>> >
>>> >
>>> > P.S. Tried to send this earlier and it didn't seem to post, so, sorry
>>> if this double-posts.
>>> > The Carpentries / discuss / see discussions + participants + delivery
>>> options Permalink
>>>
>>> ------------------------------------------
>>> The Carpentries: discuss
>>> Permalink: https://carpentries.topicbox.com/groups/discuss/T45d1f9e935d
>>> 7181b-M1da834a8946c3647d96481fc
>>> Delivery options: https://carpentries.topicbox.c
>>> om/groups/discuss/subscription
>>>
> *The Carpentries <https://carpentries.topicbox.com/latest>* / discuss /
> see discussions <https://carpentries.topicbox.com/groups/discuss> +
> participants <https://carpentries.topicbox.com/groups/discuss/members> + 
> delivery
> options <https://carpentries.topicbox.com/groups/discuss/subscription>
> Permalink
> <https://carpentries.topicbox.com/groups/discuss/T45d1f9e935d7181b-Mf6ba1feb72e39a5851f4d975>
>

------------------------------------------
The Carpentries: discuss
Permalink: 
https://carpentries.topicbox.com/groups/discuss/T45d1f9e935d7181b-M7a44a206a83d8bcb0727de45
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription

Re: [discuss] reproducible science platforms (especially for economics)?

Reply via email to