In that case, I will try to see if the public databases have the kind of data sets I am trying to package and run the idea by the team that is assigned to the project I am developing. Thank you Martin, Sean and Kasper for your valuable insight!
--- Nicolas De Jay On Fri, Nov 8, 2013 at 9:07 AM, Sean Davis <sdav...@mail.nih.gov> wrote: > > > > On Fri, Nov 8, 2013 at 8:41 AM, Martin Morgan <mtmor...@fhcrc.org> wrote: >> >> On 11/07/2013 09:26 PM, Nicolas De Jay wrote: >>> >>> Thanks for the prompt answer. The data set I am packaging closely >>> resembles that of minfiData except that there are 52 samples; the IDAT >>> files together are some 800MB whereas the Rda file is closer to 150MB. >>> It is worth noting that my experiment data package will be submitted >>> to Bioconductor along with a software package which makes use of these >>> samples in the vignette. With this in mind, can I omit the IDAT >>> files? If this goes against Bioconductor's underlying design, what >>> would you say is the maximum size of an experiment data package? >> >> >> Hi Nicolas -- Some things to bear in mind. >> > > Hi, Nicolas. > > I just wanted to note that experiment data packages are meant as a > convenient way to distribute data so that reproducible workflows and > documentation can be created easily. There are other options such as > accessing the data directly from public repositories using Bioconductor > tools that serve the same purpose. While accessing such online resources > does necessitate a one-time network connection (after which packages like > GEOquery can use locally cached data), when appropriate datasets exist in > public repositories, it may be a perfectly viable alternative to experiment > data packages. In this particular case, as of today in NCBI GEO, there are > 1711 Human 450k samples with IDAT files available. I am not arguing that > this route should replace experiment data packages, just that stable public > data resources are an alternative to them to consider. > > Sean > > >> >> Files are compressed in package tar balls, so your IDAT files may have a >> considerably smaller effective size. >> >> Generally, original text files are a much better way to store external >> data than Rda files. For instance, rda files require updating when / if the >> class definition changes, and the provenance and content of the data is >> unambiguous. >> >> Experiment data packages are meant to provide reusable examples for >> pedagogic purposes. One would hope that minfiData fulfills this requirement. >> If not, then it would be better to continue the current discussion with >> Kasper and others in the community to identify an appropriately >> comprehensive data set for use across many relevant packages. >> >> There is no formal statement about the maximum size of experiment data >> packages, but one would need to make a strong argument for why a Gb of >> experiment data is necessary (including why existing experiment data >> packages are fundamentally inadequate), especially if it is to support a >> single package. >> >> Martin >> >>> >>> --- >>> Nicolas De Jay >>> >>> On Thu, Nov 7, 2013 at 9:38 PM, Kasper Daniel Hansen >>> <kasperdanielhan...@gmail.com> wrote: >>>> >>>> To give some background: it is true that the RGsetEx object (in >>>> data/RGsetEx.rda) is a 1-1 correspondence with the raw data files in >>>> inst/extdata, so one could consider it redundant. However, having the >>>> IDAT >>>> files are convenient for testing parsing, and also for other tools who >>>> want >>>> to have 450k example data and not want to depend on minfi. Those are >>>> the >>>> two main reasons for including the raw data as well. And then the fact >>>> that >>>> while the data size is "big" it is only 6 samples. >>>> >>>> Best, >>>> Kasper >>>> >>>> >>>> On Thu, Nov 7, 2013 at 3:58 PM, Nicolas De Jay >>>> <nicolas.de...@mail.mcgill.ca> wrote: >>>>> >>>>> >>>>> Hi, >>>>> >>>>> I am preparing a data package and using the minfiData package as a >>>>> reference. The .idat files in extdata and the .rda file in data are >>>>> both present in both the compressed tarball source and the installed >>>>> copy directory (in my case, under ~/R/x86-64.../3.0/minfiData). Isn't >>>>> this redundant? Is there a way to have the prospective user only >>>>> download the .rda files? >>>>> >>>>> Sorry if my question is misguided and thanks in advance for your help. >>>>> >>>>> --- >>>>> Nicolas De Jay >>>>> M.Sc. Student >>>>> Department of Human Genetics >>>>> Montreal Children's Hospital Research Institute, McGill University >>>>> Health >>>>> Centre >>>>> 4060 Ste Catherine West, PT-239 >>>>> Montreal, QC H3Z2Z3, Canada >>>>> T: (514) 412-4440 | E: nicolas.de...@mail.mcgill.ca >>>>> >>>>> _______________________________________________ >>>>> Bioc-devel@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>> >>>> >>>> >>> >>> _______________________________________________ >>> Bioc-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>> >> >> >> -- >> Computational Biology / Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> Location: Arnold Building M1 B861 >> Phone: (206) 667-2793 >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel > > _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel