Nicolas, Experiment data packages serve different needs, including (1) a small dataset for testing / running examples for a specific package. (2) a large(r) dataset for teaching purposes and/or to showcase a real analysis workflow. You seem to suggest that your data package is tightly integrated with your "analysis" package. In general, as the package gets bigger, I think you do have a responsibility to make it useful to other analysis packages, so that several analysis packages can plug into it.
In this specific case, I would make sure the IDATs are deposited in a public database, and then I would only include rda's in the data package, together with a scrip which retrieves and packages the IDATs into the object in the data package. Best, Kasper On Fri, Nov 8, 2013 at 2:00 PM, Sean Davis <sdav...@mail.nih.gov> wrote: > On Fri, Nov 8, 2013 at 1:54 PM, Nicolas De Jay < > nicolas.de...@mail.mcgill.ca > > wrote: > > > In that case, I will try to see if the public databases have the kind > > of data sets I am trying to package and run the idea by the team that > > is assigned to the project I am developing. Thank you Martin, Sean > > and Kasper for your valuable insight! > > > > > [EDITORIAL COMMENT] > No matter what you decide about the experiment data package, if you have > data that you are willing to share (and it sounds like you do), putting > them in a public repository is a GOOD THING. > [END EDITORIAL COMMENT] > > Sean > > > > > --- > > Nicolas De Jay > > > > On Fri, Nov 8, 2013 at 9:07 AM, Sean Davis <sdav...@mail.nih.gov> wrote: > > > > > > > > > > > > On Fri, Nov 8, 2013 at 8:41 AM, Martin Morgan <mtmor...@fhcrc.org> > > wrote: > > >> > > >> On 11/07/2013 09:26 PM, Nicolas De Jay wrote: > > >>> > > >>> Thanks for the prompt answer. The data set I am packaging closely > > >>> resembles that of minfiData except that there are 52 samples; the > IDAT > > >>> files together are some 800MB whereas the Rda file is closer to > 150MB. > > >>> It is worth noting that my experiment data package will be > submitted > > >>> to Bioconductor along with a software package which makes use of > these > > >>> samples in the vignette. With this in mind, can I omit the IDAT > > >>> files? If this goes against Bioconductor's underlying design, what > > >>> would you say is the maximum size of an experiment data package? > > >> > > >> > > >> Hi Nicolas -- Some things to bear in mind. > > >> > > > > > > Hi, Nicolas. > > > > > > I just wanted to note that experiment data packages are meant as a > > > convenient way to distribute data so that reproducible workflows and > > > documentation can be created easily. There are other options such as > > > accessing the data directly from public repositories using Bioconductor > > > tools that serve the same purpose. While accessing such online > resources > > > does necessitate a one-time network connection (after which packages > like > > > GEOquery can use locally cached data), when appropriate datasets exist > in > > > public repositories, it may be a perfectly viable alternative to > > experiment > > > data packages. In this particular case, as of today in NCBI GEO, there > > are > > > 1711 Human 450k samples with IDAT files available. I am not arguing > that > > > this route should replace experiment data packages, just that stable > > public > > > data resources are an alternative to them to consider. > > > > > > Sean > > > > > > > > >> > > >> Files are compressed in package tar balls, so your IDAT files may > have a > > >> considerably smaller effective size. > > >> > > >> Generally, original text files are a much better way to store external > > >> data than Rda files. For instance, rda files require updating when / > if > > the > > >> class definition changes, and the provenance and content of the data > is > > >> unambiguous. > > >> > > >> Experiment data packages are meant to provide reusable examples for > > >> pedagogic purposes. One would hope that minfiData fulfills this > > requirement. > > >> If not, then it would be better to continue the current discussion > with > > >> Kasper and others in the community to identify an appropriately > > >> comprehensive data set for use across many relevant packages. > > >> > > >> There is no formal statement about the maximum size of experiment data > > >> packages, but one would need to make a strong argument for why a Gb of > > >> experiment data is necessary (including why existing experiment data > > >> packages are fundamentally inadequate), especially if it is to > support a > > >> single package. > > >> > > >> Martin > > >> > > >>> > > >>> --- > > >>> Nicolas De Jay > > >>> > > >>> On Thu, Nov 7, 2013 at 9:38 PM, Kasper Daniel Hansen > > >>> <kasperdanielhan...@gmail.com> wrote: > > >>>> > > >>>> To give some background: it is true that the RGsetEx object (in > > >>>> data/RGsetEx.rda) is a 1-1 correspondence with the raw data files in > > >>>> inst/extdata, so one could consider it redundant. However, having > the > > >>>> IDAT > > >>>> files are convenient for testing parsing, and also for other tools > who > > >>>> want > > >>>> to have 450k example data and not want to depend on minfi. Those > are > > >>>> the > > >>>> two main reasons for including the raw data as well. And then the > > fact > > >>>> that > > >>>> while the data size is "big" it is only 6 samples. > > >>>> > > >>>> Best, > > >>>> Kasper > > >>>> > > >>>> > > >>>> On Thu, Nov 7, 2013 at 3:58 PM, Nicolas De Jay > > >>>> <nicolas.de...@mail.mcgill.ca> wrote: > > >>>>> > > >>>>> > > >>>>> Hi, > > >>>>> > > >>>>> I am preparing a data package and using the minfiData package as a > > >>>>> reference. The .idat files in extdata and the .rda file in data > are > > >>>>> both present in both the compressed tarball source and the > installed > > >>>>> copy directory (in my case, under ~/R/x86-64.../3.0/minfiData). > > Isn't > > >>>>> this redundant? Is there a way to have the prospective user only > > >>>>> download the .rda files? > > >>>>> > > >>>>> Sorry if my question is misguided and thanks in advance for your > > help. > > >>>>> > > >>>>> --- > > >>>>> Nicolas De Jay > > >>>>> M.Sc. Student > > >>>>> Department of Human Genetics > > >>>>> Montreal Children's Hospital Research Institute, McGill University > > >>>>> Health > > >>>>> Centre > > >>>>> 4060 Ste Catherine West, PT-239 > > >>>>> Montreal, QC H3Z2Z3, Canada > > >>>>> T: (514) 412-4440 | E: nicolas.de...@mail.mcgill.ca > > >>>>> > > >>>>> _______________________________________________ > > >>>>> Bioc-devel@r-project.org mailing list > > >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > > >>>> > > >>>> > > >>>> > > >>> > > >>> _______________________________________________ > > >>> Bioc-devel@r-project.org mailing list > > >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > > >>> > > >> > > >> > > >> -- > > >> Computational Biology / Fred Hutchinson Cancer Research Center > > >> 1100 Fairview Ave. N. > > >> PO Box 19024 Seattle, WA 98109 > > >> > > >> Location: Arnold Building M1 B861 > > >> Phone: (206) 667-2793 > > >> > > >> _______________________________________________ > > >> Bioc-devel@r-project.org mailing list > > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > > > > > > > _______________________________________________ > > Bioc-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel