[jupyter] Good solution for sharing largish data sets?

'Aaron Watters' via Project Jupyter Thu, 15 Mar 2018 11:07:00 -0700

Hi folks,

I'm interested in techniques for sharing data in scientific workflows.
Tools like git/github and docker/repo2docker are great for sharing 
computational
environments and moderate sized data, but not good for sharing (say)
hundreds of gigabytes of data.  What do people do?

I have in mind something like this: a scientist on a good network
spins up a jupyter server in a Docker container containing a workflow
using github and repo2docker. In the container s/he provides some
authorization credentials and data for the workflow appears in the
container if the credentials are valid, maybe with read/write access
of some sort if the credentials are really good.

If we are interested in provided publically accessible data in read
only mode we could just dump the data to a web server anywhere
and pull it down using HTTP,

I don't know the right way to do this if we want to have limited access
to the data and sometime provide the ability to write the data.

I'm also interested in the case where the scientist is remote --
ie, certain people are allowed to use our compute cluster possible
with data they have locally or with other data out there somewhere...

Any and all thoughts or pointers appreciated. Thanks!
Sorry if the question is silly or too vague.

-- Aaron Watters

--
You received this message because you are subscribed to the Google Groups
"Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to jupyter+unsubscr...@googlegroups.com.
To post to this group, send email to jupyter@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/jupyter/003e34fa-a547-40c5-a617-8997ee5db326%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[jupyter] Good solution for sharing largish data sets?

Reply via email to