Hi folks, I'm interested in techniques for sharing data in scientific workflows. Tools like git/github and docker/repo2docker are great for sharing computational environments and moderate sized data, but not good for sharing (say) hundreds of gigabytes of data. What do people do?
I have in mind something like this: a scientist on a good network spins up a jupyter server in a Docker container containing a workflow using github and repo2docker. In the container s/he provides some authorization credentials and data for the workflow appears in the container if the credentials are valid, maybe with read/write access of some sort if the credentials are really good. If we are interested in provided publically accessible data in read only mode we could just dump the data to a web server anywhere and pull it down using HTTP, I don't know the right way to do this if we want to have limited access to the data and sometime provide the ability to write the data. I'm also interested in the case where the scientist is remote -- ie, certain people are allowed to use our compute cluster possible with data they have locally or with other data out there somewhere... Any and all thoughts or pointers appreciated. Thanks! Sorry if the question is silly or too vague. -- Aaron Watters -- You received this message because you are subscribed to the Google Groups "Project Jupyter" group. To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+unsubscr...@googlegroups.com. To post to this group, send email to jupyter@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/003e34fa-a547-40c5-a617-8997ee5db326%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.