On 11/07/2013 09:26 PM, Nicolas De Jay wrote:
Thanks for the prompt answer. The data set I am packaging closely
resembles that of minfiData except that there are 52 samples; the IDAT
files together are some 800MB whereas the Rda file is closer to 150MB.
It is worth noting that my experiment data package will be submitted
to Bioconductor along with a software package which makes use of these
samples in the vignette. With this in mind, can I omit the IDAT
files? If this goes against Bioconductor's underlying design, what
would you say is the maximum size of an experiment data package?
Hi Nicolas -- Some things to bear in mind.
Files are compressed in package tar balls, so your IDAT files may have a
considerably smaller effective size.
Generally, original text files are a much better way to store external data than
Rda files. For instance, rda files require updating when / if the class
definition changes, and the provenance and content of the data is unambiguous.
Experiment data packages are meant to provide reusable examples for pedagogic
purposes. One would hope that minfiData fulfills this requirement. If not, then
it would be better to continue the current discussion with Kasper and others in
the community to identify an appropriately comprehensive data set for use across
many relevant packages.
There is no formal statement about the maximum size of experiment data packages,
but one would need to make a strong argument for why a Gb of experiment data is
necessary (including why existing experiment data packages are fundamentally
inadequate), especially if it is to support a single package.
Martin
---
Nicolas De Jay
On Thu, Nov 7, 2013 at 9:38 PM, Kasper Daniel Hansen
<kasperdanielhan...@gmail.com> wrote:
To give some background: it is true that the RGsetEx object (in
data/RGsetEx.rda) is a 1-1 correspondence with the raw data files in
inst/extdata, so one could consider it redundant. However, having the IDAT
files are convenient for testing parsing, and also for other tools who want
to have 450k example data and not want to depend on minfi. Those are the
two main reasons for including the raw data as well. And then the fact that
while the data size is "big" it is only 6 samples.
Best,
Kasper
On Thu, Nov 7, 2013 at 3:58 PM, Nicolas De Jay
<nicolas.de...@mail.mcgill.ca> wrote:
Hi,
I am preparing a data package and using the minfiData package as a
reference. The .idat files in extdata and the .rda file in data are
both present in both the compressed tarball source and the installed
copy directory (in my case, under ~/R/x86-64.../3.0/minfiData). Isn't
this redundant? Is there a way to have the prospective user only
download the .rda files?
Sorry if my question is misguided and thanks in advance for your help.
---
Nicolas De Jay
M.Sc. Student
Department of Human Genetics
Montreal Children's Hospital Research Institute, McGill University Health
Centre
4060 Ste Catherine West, PT-239
Montreal, QC H3Z2Z3, Canada
T: (514) 412-4440 | E: nicolas.de...@mail.mcgill.ca
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel