Hi Yaroslav,

To add to what Mike Harms just wrote, it still sounds like you are thinking of 
the packages as data bundles for groups of subjects. Subjects don't "belong" to 
a package because the packages don't contain groups of subjects. Instead, the 
packages are separate and specific to a particular subject ID, modality, 
processing level, and smoothing level (for fMRI).


I think the confusion is that individual subject packages can be queued for 
download in groups in ConnectomeDB. We could provide you with lists of subjects 
for each searchable group in ConnectomeDB, (e.g. U100, 7T data available, MEG 
data available) with which some users may have queued all subjects in a group 
for download of specific data packages for their analysis.


Another slight complication is that there is a little bit of overlap in the 
data in the packages themselves so that there is more than one package 
associated with some of the files. This was done so that users would have 
everything they need to do a certain analysis from a particular package. For 
example, the FIX and FIX-extended packages include a few of the same output 
files, although the FIX-extended package has a lot more of the FIX intermediate 
files to allow users to evaluate how FIX worked for a particular subject.


Best,

Jenn


Jennifer Elam, Ph.D.
Scientific Outreach, Human Connectome Project
Washington University School of Medicine
Department of Neuroscience, Box 8108
660 South Euclid Avenue
St. Louis, MO 63110
314-362-9387<tel:314-362-9387>
[email protected]<mailto:[email protected]>
www.humanconnectome.org<http://www.humanconnectome.org/>


________________________________
From: [email protected] 
<[email protected]> on behalf of Yaroslav Halchenko 
<[email protected]>
Sent: Tuesday, December 6, 2016 1:47:19 PM
To: [email protected]
Subject: Re: [HCP-Users] (files) listing for file bundles

On Tue, 06 Dec 2016, Elam, Jennifer wrote:
>    A listing of the by subject unpacked files available, organized by
>    modality and processing level, are available in Appendix 3 of the
>    Reference Manual.

>    The files are listed there as they unpack into a standard directory
>    structure. They are not organized by ConnectomeDB packages, per se,
>    because the listing is to be also applicable to users of Connectome in a
>    Box and Amazon S3. If you really need a listing of the package contents
>    themselves, we (Mike Hodge) can provide that separately.


On Tue, 06 Dec 2016, Hodge, Michael wrote:
> Yaroslav,

> Separate packages are created for each subject.  The list I sent just listed 
> packages for a couple of subjects to show you the files contained in the 
> packages by example.  There aren't packages that correspond to the unrelated 
> groups.  Each subject in the groups has a set of packages.  I could repeat 
> the unzip search across all subjects if you wish, but it would be a very 
> large file.


Dear Jennifer and Michael,

Thank you for your replies!

Let me may be describe my target use-case and why I was asking about
packages, which may be would make situation a bit clearer.

s3 HCP bucket provides convenient access to the dataset's individual files
but they lack annotation on what package(s) (as shipped from db.) any
particular file possibly belongs to.  But such "packaging" is important
meta-information since many folks analyze data from a particular "package".

In datalad project we would like to provide access to data from HCP bucket, but
also would like  to allow users to specify "packages" -- as to which specific
sub-datasets (e.g. not all subjects when not all subjects belong to a
package) to install and which files to download.  So it would look like
following if we assume that 7T_MOVIE_2mm_preproc  is a name of an example
package which contains a subset of subjects with 7T movie "task" data.

        datalad search 7T_MOVIE_2mm_preproc | xargs datalad install

to install those subjects' datasets (git-annex repositories without actual data
by default), and then (hypothetical API)

        datalad get -r --annex-meta 7T_MOVIE_2mm_preproc

to actually fetch data files present in the  7T_MOVIE_2mm_preproc  package.

Similarly, they could later run

Since, I guess, you are composing those "packages" somehow already from a list
of rules/files, I just thought that may be those could be shared, so we could
embed that information in our annex HCP repositories and to not incur any
additional "development/setup/maintenance cost" (as to dumping listing of
generated .zip files).  Then, if just plain .txt files with listings (unlike
formatted pdfs -- easily machine readable), then people could also easily
come up with their 1 line shell scripts to fetch corresponding to packages
files from s3.

So -- overall -- listings produced by Michael would work but I wondered if we
could avoid (re)creating them and possibly make them even better for
machine-parsing (e.g. one .txt file per each package which would include paths
for files for all the subjects in that package).

BTW --   7T_MOVIE_2mm_preproc   set of files is not yet on S3 bucket.  When
will that portion be uploaded?

--
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik
_______________________________________________
HCP-Users mailing list
[email protected]
http://lists.humanconnectome.org/mailman/listinfo/hcp-users

_______________________________________________
HCP-Users mailing list
[email protected]
http://lists.humanconnectome.org/mailman/listinfo/hcp-users

Reply via email to