Re: [HCP-Users] mounting the HCP data on an ec2 isntance instead of s3 access

Denis-Alexander Engemann Wed, 19 Oct 2016 11:18:41 -0700

This is absolutely essential information. Thank you Tim. My standard use
case is Linux ubuntu worker which I have pre-configures, using my own AMI
that I launch in the same region in which the HCP data are provided,
us-east-1.
I then would download the data from there which can take some time,
depending on the bandwidth of the chosen instance. I then push my results
to my own s3 repositories. I was thinking that mounting the data could be a
good idea (it would potentially cut down time and solve capacity problems
of the worker which can have limited disk space. I personally would not
have a problem upgrading to a NITRIC AMI if I find a standard Linux worker.


Thank you once more,
Denis

On Wed, Oct 19, 2016 at 7:55 PM Timothy B. Brown <[email protected]> wrote:

Hello Denis,

I understand that Robert Oostenveld is planning to send you some materials
from the latest HCP Course that illustrate how to mount the HCP OpenAccess
S3 bucket as a directory accessible from a running EC2 instance.

However, I'd like to clarify a few things.

First, the materials you will receive from Robert assume that you are using
an Amazon EC2 instance (virtual machine) *that is based on an AMI supplied
by NITRC* (analogous to a DVD of software supplied and configured by NITRC
to be loaded on your virtual machine). In fact the instructions show you
how to create a new EC2 instance based on that NITRC AMI.

The folks at NITRC have done a lot of the work for you (like including the
necessary software to mount an S3 bucket) and provided a web interface for
you to specify your credentials for accessing the HCP OpenAccess S3 bucket.
If you want to create an EC2 instance based on the NITRC AMI, then things
should work well for you and the materials Robert sends to you should
hopefully be helpful.

But this will not be particularly useful to you if you are using an EC2
instance that is *not* based upon the NITRC AMI. If that is the case, you
will have to do a bit more work. You will need to install a tool called
*s3fs* ("S3 File System") on your instance and then configure s3fs to mount
the HCP OpenAccess S3 bucket. This configuration will include storing your
AWS access key information in a secure file on your running instance.

A good starting point for instructions for doing this can be found at:
https://forums.aws.amazon.com/message.jspa?messageID=313009

This may not cover all the issues you encounter and you may have to search
for other documentation on using s3fs under Linux to get things fully
configured. The information at:
https://rameshpalanisamy.wordpress.com/aws/adding-s3-bucket-and-mounting-it-to-linux/
may also be helpful.

Second, once you get the S3 bucket mounted, it is very important to realize
that it is *read-only* from your system. By mounting the S3 bucket using
s3fs, you have not created an actual EBS volume on your system that
contains the HCP OpenAccess data, just a mount point where you can *read*
the files in the S3 bucket.

You will likely want to create a separate EBS volume on which you will run
pipelines, generate new files, and do any further analysis that you want to
do. To work with the data, you will want the HCP OpenAccess S3 bucket data
to at least *appear* to be on that separate EBS volume. One approach would
be to selectively copy data files from the mounted S3 data onto your EBS
volume. However, this would be duplicating a lot of data onto the EBS
volume, taking a long time and costing you money for storage of data that
is already in the S3 bucket. I think a better approach is to create a
directory structure on your EBS volume that contains files which are
actually symbolic links to the read-only data that is accessible via your
S3 mount point.

The materials that Robert sent (or will send) you contain instructions for
how to get and use a script that I've written that will create such a
directory structure of symbolic links. After looking over those
instructions, if it is not obvious to you what script I'm referring to and
how to use it, feel free to send a follow up question to me.

Hope that's helpful,

  Tim
On 10/18/2016 10:51 AM, Denis-Alexander Engemann wrote:

Dear HCPers,

I recently had a conversation with Robert who suggested to me that it
should be possible to directly mount the HCP data like an EBS volume
instead of using the s3 tools for copying the data file by file.
Any hint would be appreciated.

Cheers,
Denis

_______________________________________________
HCP-Users mailing list
[email protected]
http://lists.humanconnectome.org/mailman/listinfo/hcp-users


-- 
Timothy B. Brown
Business & Technology Application Analyst III
Pipeline Developer (Human Connectome Project)
tbbrown(at)wustl.edu
------------------------------
The material in this message is private and may contain Protected
Healthcare Information (PHI). If you are not the intended recipient, be
advised that any unauthorized use, disclosure, copying or the taking of any
action in reliance on the contents of this information is strictly
prohibited. If you have received this email in error, please immediately
notify the sender via telephone or return mail.

_______________________________________________
HCP-Users mailing list
[email protected]
http://lists.humanconnectome.org/mailman/listinfo/hcp-users

Re: [HCP-Users] mounting the HCP data on an ec2 isntance instead of s3 access

Reply via email to