On Fri, Nov 8, 2013 at 1:54 PM, Nicolas De Jay <nicolas.de...@mail.mcgill.ca > wrote:
> In that case, I will try to see if the public databases have the kind > of data sets I am trying to package and run the idea by the team that > is assigned to the project I am developing. Thank you Martin, Sean > and Kasper for your valuable insight! > > [EDITORIAL COMMENT] No matter what you decide about the experiment data package, if you have data that you are willing to share (and it sounds like you do), putting them in a public repository is a GOOD THING. [END EDITORIAL COMMENT] Sean > --- > Nicolas De Jay > > On Fri, Nov 8, 2013 at 9:07 AM, Sean Davis <sdav...@mail.nih.gov> wrote: > > > > > > > > On Fri, Nov 8, 2013 at 8:41 AM, Martin Morgan <mtmor...@fhcrc.org> > wrote: > >> > >> On 11/07/2013 09:26 PM, Nicolas De Jay wrote: > >>> > >>> Thanks for the prompt answer. The data set I am packaging closely > >>> resembles that of minfiData except that there are 52 samples; the IDAT > >>> files together are some 800MB whereas the Rda file is closer to 150MB. > >>> It is worth noting that my experiment data package will be submitted > >>> to Bioconductor along with a software package which makes use of these > >>> samples in the vignette. With this in mind, can I omit the IDAT > >>> files? If this goes against Bioconductor's underlying design, what > >>> would you say is the maximum size of an experiment data package? > >> > >> > >> Hi Nicolas -- Some things to bear in mind. > >> > > > > Hi, Nicolas. > > > > I just wanted to note that experiment data packages are meant as a > > convenient way to distribute data so that reproducible workflows and > > documentation can be created easily. There are other options such as > > accessing the data directly from public repositories using Bioconductor > > tools that serve the same purpose. While accessing such online resources > > does necessitate a one-time network connection (after which packages like > > GEOquery can use locally cached data), when appropriate datasets exist in > > public repositories, it may be a perfectly viable alternative to > experiment > > data packages. In this particular case, as of today in NCBI GEO, there > are > > 1711 Human 450k samples with IDAT files available. I am not arguing that > > this route should replace experiment data packages, just that stable > public > > data resources are an alternative to them to consider. > > > > Sean > > > > > >> > >> Files are compressed in package tar balls, so your IDAT files may have a > >> considerably smaller effective size. > >> > >> Generally, original text files are a much better way to store external > >> data than Rda files. For instance, rda files require updating when / if > the > >> class definition changes, and the provenance and content of the data is > >> unambiguous. > >> > >> Experiment data packages are meant to provide reusable examples for > >> pedagogic purposes. One would hope that minfiData fulfills this > requirement. > >> If not, then it would be better to continue the current discussion with > >> Kasper and others in the community to identify an appropriately > >> comprehensive data set for use across many relevant packages. > >> > >> There is no formal statement about the maximum size of experiment data > >> packages, but one would need to make a strong argument for why a Gb of > >> experiment data is necessary (including why existing experiment data > >> packages are fundamentally inadequate), especially if it is to support a > >> single package. > >> > >> Martin > >> > >>> > >>> --- > >>> Nicolas De Jay > >>> > >>> On Thu, Nov 7, 2013 at 9:38 PM, Kasper Daniel Hansen > >>> <kasperdanielhan...@gmail.com> wrote: > >>>> > >>>> To give some background: it is true that the RGsetEx object (in > >>>> data/RGsetEx.rda) is a 1-1 correspondence with the raw data files in > >>>> inst/extdata, so one could consider it redundant. However, having the > >>>> IDAT > >>>> files are convenient for testing parsing, and also for other tools who > >>>> want > >>>> to have 450k example data and not want to depend on minfi. Those are > >>>> the > >>>> two main reasons for including the raw data as well. And then the > fact > >>>> that > >>>> while the data size is "big" it is only 6 samples. > >>>> > >>>> Best, > >>>> Kasper > >>>> > >>>> > >>>> On Thu, Nov 7, 2013 at 3:58 PM, Nicolas De Jay > >>>> <nicolas.de...@mail.mcgill.ca> wrote: > >>>>> > >>>>> > >>>>> Hi, > >>>>> > >>>>> I am preparing a data package and using the minfiData package as a > >>>>> reference. The .idat files in extdata and the .rda file in data are > >>>>> both present in both the compressed tarball source and the installed > >>>>> copy directory (in my case, under ~/R/x86-64.../3.0/minfiData). > Isn't > >>>>> this redundant? Is there a way to have the prospective user only > >>>>> download the .rda files? > >>>>> > >>>>> Sorry if my question is misguided and thanks in advance for your > help. > >>>>> > >>>>> --- > >>>>> Nicolas De Jay > >>>>> M.Sc. Student > >>>>> Department of Human Genetics > >>>>> Montreal Children's Hospital Research Institute, McGill University > >>>>> Health > >>>>> Centre > >>>>> 4060 Ste Catherine West, PT-239 > >>>>> Montreal, QC H3Z2Z3, Canada > >>>>> T: (514) 412-4440 | E: nicolas.de...@mail.mcgill.ca > >>>>> > >>>>> _______________________________________________ > >>>>> Bioc-devel@r-project.org mailing list > >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >>>> > >>>> > >>>> > >>> > >>> _______________________________________________ > >>> Bioc-devel@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >>> > >> > >> > >> -- > >> Computational Biology / Fred Hutchinson Cancer Research Center > >> 1100 Fairview Ave. N. > >> PO Box 19024 Seattle, WA 98109 > >> > >> Location: Arnold Building M1 B861 > >> Phone: (206) 667-2793 > >> > >> _______________________________________________ > >> Bioc-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel