Re: [HCP-Users] architecture question, plus general HCP FAQ's

David Van Essen Wed, 14 Nov 2012 15:50:36 -0800

To Geoff and other hcp-users,
 
Thanks, Geoff, for your questions about the architecture of the HCP informatics 
domain.  Your "interpretation #3" is closest to the mark for where HCP is 
headed.  In many instances users will find it desirable to have HCP process 
data queries centrally (on our high-performance computer linked to the 
ConnectomeDB database), then return the results to Connectome Workbench running 
on a local platform. In other circumstances, investigators having 
computationally-intensive queries will be better served by having a local copy 
of the data and local processing. In such cases, investigators may obtain the 
“Connectome in a Box, at cost (hard drive + shipping). We will give additional 
information and guidance about these and other options in conjunction with our 
February release.
 
Although it’s more than you asked for, I’d like to take this opportunity to 
provide hcp-users group with a high-level picture of what will (and will not) 
become available in the coming months.
 
What's the purpose of the HCP October Initial data release?  Our October data 
release is intended to let investigators 'get their feet wet' with an initial 
bolus of high-quality datasets (12 subjects, ~25 GByte/subject). For example, 
some investigators may want to modify their own analysis tools in order to make 
the best use of HCP data. By providing the October dataset as an example, 
investigators will have lead time to do work of this kind so that they can 
begin real data analysis once the Q1 Data Release occurs in February, 2013.   
For two reasons, the October data are not intended for analyses that will lead 
immediately to scientific publications. (1) There is a caveat emptor to the 
minimally preprocessed datasets released in October, because our pipelines have 
not been completely finalized.  Indeed, a recent (Nov 9) email reported a 
glitch in how the fMRI timeseries data were processed; in the meantime we also 
made several improvements to the pipelines.  Version 2 Initial datasets (same 
12 subjects released in October) are currently being reprocessed and will be 
released in about a week.  (2) In addition, many of the 12 subjects are related 
to one another (twins or non-twin siblings), which could bias the results of 
some analyses.  
 
What data will be released in February?   We expect to release data for ~70 
subjects for which we have complete scan sessions acquired during the first 
quarter.  (Recall that it will take three years to scan all 1200 subjects!)  
This will include the unprocessed data and the minimally processed data, akin 
to that provided in the October data release.  
 
Restricted Access Data.  To obtain information about family structure 
(identification of twins and siblings), investigators will be required to sign 
a special data use agreement.  Many investigators may instead (or also) elect 
to start by using data from a group of 20 unrelated subjects that will be made 
freely accessible (so that there are no complications regarding family 
structure).  
 
Fully processed data.  Fully processed data will likely include 'dense 
connectomes' for functional and structural connectivity, plus task-fMRI 
activation patterns.  We also hope to include 'parcellated connectomes' based 
on initial connectivity-based parcellations derived from HCP data - but no 
promises!  These fully processed datasets will be based on the initial group of 
20 unrelated subjects. 
 
What is ConnectomeDB?  ConnectomeDB is the external-facing HCP database, based 
on the XNAT platform developed in Dan Marcus' lab.  Data storage is on a 
BlueArc hardware system, whose eventual capacity will be ~1 petabyte.  In 
conjunction with the February data release we will allow the community to 
access HCP data via ConnectomeDB and to download selected datasets from within 
ConnectomeDB, albeit with file size restrictions. Larger datasets will be 
downloadable via ftp or obtainable via the “Connectome in a Box” solution 
mentioned above.
 
What data mining capabilities will be offered in February?   This is a work in 
progress.  For those familiar with XNAT, many search capabilities in XNAT will 
be available on ConnectomeDB, but will be customized to handle unique aspects 
of the HCP datasets.  This will include options to select subgroups of 
individuals based on a variety of behavioral and other measures.  We are aiming 
to provide options to view average functional connectivity maps for different 
groups and different brain regions of interest.
 
What is Connectome Toolbox? The 'connectome toolbox' you asked about is not a 
formally-defined entity but is instead our name for the growing collection of 
tools that HCP will provide to the community in addition to the data itself. 
This will include the Connectome Workbench visualization and analysis platform, 
plus resources such as the code for our analysis pipelines and our in-scanner 
T-fMRI tasks. We intend to make all data and tools freely available to those 
who want them, although this will have to occur in an orderly and logical 
process once the tools themselves are ready.  
 
There's a lot more under the hood, but hopefully this brief overview will be 
useful.


If you know colleagues who may be interested in this information, please feel 
free to forward them this email; also, encourage them to join hcp-users at 
http://lists.humanconnectome.org/mailman/listinfo/hcp-users

David VE
 
On Nov 9, 2012, at 12:18 PM, Geoff Pope wrote:

Hi. I'm trying to get a general understanding of the architecture of the 
system: what are all the software components, where will they run, how will we 
use them, and what are the bandwidth and storage requirements? 

I have looked at this
http://www.humanconnectome.org/connectome/

and this
http://www.humanconnectome.org/about/project/informatics.html

But there is more than one interpretation; please clarify.

Interpretation 1:
-the 1200 subjects' data are stored on the connectome server.
-the "connectome toolbox" is software running on the connectome server.
-users log into the connectome server and run scripts which call "connectome 
toolbox" functions, to select subjects and scans, set up statistical tests, and 
produce output images.
-the workbench runs on the user's desktop machine (the client).
-the workbench downloads the output images created by the "connectome toolbox" 
running on the server, and displays them (the workbench is analogous to fsl's 
fslview or freesurfer's tksurfer, with added ftp functionality).
(Here only the output images are downloaded, so this option has low bandwidth 
and client storage requirements).

Interpretation 2:
-the 1200 subjects' data are stored on the connectome server.
-the "connectome toolbox" is command line software running on the user's 
computer (the client).
-the connectome toolbox is used to select and download scans from the 
connectome server, to set up statistical tests, and to do processing on the 
client.
-the workbench is a client side tool for displaying the images created by the 
client side "connectome toolbox" 
(this has high bandwidth and client storage requirements)

Interpretation 3:
-the 1200 subjects' data are stored on the connectome server.
-the workbench runs on the user's desktop machine (the client).
-the workbench is used to orchestrate processing on the server (select subjects 
and scans, set up statistical tests, display output images, all using a GUI). 
(Here only the output images are downloaded, so this option has low bandwidth 
and client storage requirements)


How will it work?

Thanks,
Geoff Pope
_______________________________________________
HCP-Users mailing list
[email protected]
http://lists.humanconnectome.org/mailman/listinfo/hcp-users

Re: [HCP-Users] architecture question, plus general HCP FAQ's

Reply via email to