On Fri, Nov 8, 2013 at 8:41 AM, Martin Morgan <mtmor...@fhcrc.org> wrote:

> On 11/07/2013 09:26 PM, Nicolas De Jay wrote:
>
>> Thanks for the prompt answer.  The data set I am packaging closely
>> resembles that of minfiData except that there are 52 samples; the IDAT
>> files together are some 800MB whereas the Rda file is closer to 150MB.
>>   It is worth noting that my experiment data package will be submitted
>> to Bioconductor along with a software package which makes use of these
>> samples in the vignette.  With this in mind, can I omit the IDAT
>> files?  If this goes against Bioconductor's underlying design, what
>> would you say is the maximum size of an experiment data package?
>>
>
> Hi Nicolas -- Some things to bear in mind.
>
>
Hi, Nicolas.

I just wanted to note that experiment data packages are meant as a
convenient way to distribute data so that reproducible workflows and
documentation can be created easily.  There are other options such as
accessing the data directly from public repositories using Bioconductor
tools that serve the same purpose.  While accessing such online resources
does necessitate a one-time network connection (after which packages like
GEOquery can use locally cached data), when appropriate datasets exist in
public repositories, it may be a perfectly viable alternative to experiment
data packages.  In this particular case, as of today in NCBI GEO, there are
1711 Human 450k samples with IDAT files available.  I am not arguing that
this route should replace experiment data packages, just that stable public
data resources are an alternative to them to consider.

Sean



> Files are compressed in package tar balls, so your IDAT files may have a
> considerably smaller effective size.
>
> Generally, original text files are a much better way to store external
> data than Rda files. For instance, rda files require updating when / if the
> class definition changes, and the provenance and content of the data is
> unambiguous.
>
> Experiment data packages are meant to provide reusable examples for
> pedagogic purposes. One would hope that minfiData fulfills this
> requirement. If not, then it would be better to continue the current
> discussion with Kasper and others in the community to identify an
> appropriately comprehensive data set for use across many relevant packages.
>
> There is no formal statement about the maximum size of experiment data
> packages, but one would need to make a strong argument for why a Gb of
> experiment data is necessary (including why existing experiment data
> packages are fundamentally inadequate), especially if it is to support a
> single package.
>
> Martin
>
>
>> ---
>> Nicolas De Jay
>>
>> On Thu, Nov 7, 2013 at 9:38 PM, Kasper Daniel Hansen
>> <kasperdanielhan...@gmail.com> wrote:
>>
>>> To give some background: it is true that the RGsetEx object (in
>>> data/RGsetEx.rda) is a 1-1 correspondence with the raw data files in
>>> inst/extdata, so one could consider it redundant.  However, having the
>>> IDAT
>>> files are convenient for testing parsing, and also for other tools who
>>> want
>>> to have 450k example data and not want to depend on minfi.  Those are the
>>> two main reasons for including the raw data as well.  And then the fact
>>> that
>>> while the data size is "big" it is only 6 samples.
>>>
>>> Best,
>>> Kasper
>>>
>>>
>>> On Thu, Nov 7, 2013 at 3:58 PM, Nicolas De Jay
>>> <nicolas.de...@mail.mcgill.ca> wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> I am preparing a data package and using the minfiData package as a
>>>> reference.  The .idat files in extdata and the .rda file in data are
>>>> both present in both the compressed tarball source and the installed
>>>> copy directory (in my case, under ~/R/x86-64.../3.0/minfiData).  Isn't
>>>> this redundant?  Is there a way to have the prospective user only
>>>> download the .rda files?
>>>>
>>>> Sorry if my question is misguided and thanks in advance for your help.
>>>>
>>>> ---
>>>> Nicolas De Jay
>>>> M.Sc. Student
>>>> Department of Human Genetics
>>>> Montreal Children's Hospital Research Institute, McGill University
>>>> Health
>>>> Centre
>>>> 4060 Ste Catherine West, PT-239
>>>> Montreal, QC H3Z2Z3, Canada
>>>> T: (514) 412-4440 | E: nicolas.de...@mail.mcgill.ca
>>>>
>>>> _______________________________________________
>>>> Bioc-devel@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>
>>>
>>>
>> _______________________________________________
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>
> _______________________________________________
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to