
Great question.  My name's Matt Turk and along with some other folks
(lurking?) on this list I work on a project called Whole Tale.  We
just had an overview paper published (gold OA) at
https://doi.org/10.1016/j.future.2017.12.029 that gives some
architectural information, but the gist is that we're trying to solve
that exact problem.  Our website isn't the best, and we're not
confident of a stable, running instance until early summer (I bet if
you logged in you could find ways to break it or prickly bits in the
UI), but you can find a bit more at wholetale.org and
github.com/whole-tale .  You could even launch your own instance,
should you want to.

The long and the short of it is that we run docker containers (not
only Jupyter, but it's currently used as one of the defaults) with
computational environments and "inject" data through a handcrafted
FUSE fs.

The ultimate location of the data is not important (can be both local
or remote), as long as you provide a valid uri containing both
location and transfer protocol (e.g. 'http://example.com/file',
'globus:/endpoint/foo/bar'). There's a couple of additional attributes
you need to provide (size & name, although over HTTP sometimes we can
get these). We keep track of all of those using an external db
(MongoDB via Girder) which is subsequently used by FUSE to resolve
OS-level IO calls into appropriate requests for data. For example,
when you open() a file that's registered as a 'http://' url, it will
(invisibly) locally cache it and present it as though it were local.

Kacper Kowalik, our software architect, recently gave a presentation
on it that you can see here that might be of interest:
http://use.yt/upload/c8236396 .

I'd be happy to share more here or offline, too, but this is something
we're working on pretty hard and while we have a ways to go --
especially in smoothing things out from a UI/UX perspective and
getting stability of the platform, we're working hard on it and really
want to engage much more deeply with folks throughout the community.

-Matt, on behalf of the Whole Tale team

On Thu, Mar 15, 2018 at 1:06 PM, 'Aaron Watters' via Project Jupyter
<jupyter@googlegroups.com> wrote:
> Hi folks,
> I'm interested in techniques for sharing data in scientific workflows.
> Tools like git/github and docker/repo2docker are great for sharing
> computational
> environments and moderate sized data, but not good for sharing (say)
> hundreds of gigabytes of data.  What do people do?
> I have in mind something like this: a scientist on a good network
> spins up a jupyter server in a Docker container containing a workflow
> using github and repo2docker.  In the container s/he provides some
> authorization credentials and data for the workflow appears in the
> container if the credentials are valid, maybe with read/write access
> of some sort if the credentials are really good.
> If we are interested in provided publically accessible data in read
> only mode we could just dump the data to a web server anywhere
> and pull it down using HTTP,
> I don't know the right way to do this if we want to have limited access
> to the data and sometime provide the ability to write the data.
> I'm also interested in the case where the scientist is remote --
> ie, certain people are allowed to use our compute cluster possible
> with data they have locally or with other data out there somewhere...
> Any and all thoughts or pointers appreciated.  Thanks!
> Sorry if the question is silly or too vague.
>    -- Aaron Watters
> --
> You received this message because you are subscribed to the Google Groups
> "Project Jupyter" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to jupyter+unsubscr...@googlegroups.com.
> To post to this group, send email to jupyter@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jupyter/003e34fa-a547-40c5-a617-8997ee5db326%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google Groups 
"Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jupyter+unsubscr...@googlegroups.com.
To post to this group, send email to jupyter@googlegroups.com.
To view this discussion on the web visit 
For more options, visit https://groups.google.com/d/optout.

Reply via email to