On Fri, Nov 8, 2013 at 1:54 PM, Nicolas De Jay <nicolas.de...@mail.mcgill.ca
> wrote:

> In that case, I will try to see if the public databases have the kind
> of data sets I am trying to package and run the idea by the team that
> is assigned to the project I am developing.  Thank you Martin, Sean
> and Kasper for your valuable insight!
>
>
[EDITORIAL COMMENT]
No matter what you decide about the experiment data package, if you have
data that you are willing to share (and it sounds like you do), putting
them in a public repository is a GOOD THING.
[END EDITORIAL COMMENT]

Sean



> ---
> Nicolas De Jay
>
> On Fri, Nov 8, 2013 at 9:07 AM, Sean Davis <sdav...@mail.nih.gov> wrote:
> >
> >
> >
> > On Fri, Nov 8, 2013 at 8:41 AM, Martin Morgan <mtmor...@fhcrc.org>
> wrote:
> >>
> >> On 11/07/2013 09:26 PM, Nicolas De Jay wrote:
> >>>
> >>> Thanks for the prompt answer.  The data set I am packaging closely
> >>> resembles that of minfiData except that there are 52 samples; the IDAT
> >>> files together are some 800MB whereas the Rda file is closer to 150MB.
> >>>   It is worth noting that my experiment data package will be submitted
> >>> to Bioconductor along with a software package which makes use of these
> >>> samples in the vignette.  With this in mind, can I omit the IDAT
> >>> files?  If this goes against Bioconductor's underlying design, what
> >>> would you say is the maximum size of an experiment data package?
> >>
> >>
> >> Hi Nicolas -- Some things to bear in mind.
> >>
> >
> > Hi, Nicolas.
> >
> > I just wanted to note that experiment data packages are meant as a
> > convenient way to distribute data so that reproducible workflows and
> > documentation can be created easily.  There are other options such as
> > accessing the data directly from public repositories using Bioconductor
> > tools that serve the same purpose.  While accessing such online resources
> > does necessitate a one-time network connection (after which packages like
> > GEOquery can use locally cached data), when appropriate datasets exist in
> > public repositories, it may be a perfectly viable alternative to
> experiment
> > data packages.  In this particular case, as of today in NCBI GEO, there
> are
> > 1711 Human 450k samples with IDAT files available.  I am not arguing that
> > this route should replace experiment data packages, just that stable
> public
> > data resources are an alternative to them to consider.
> >
> > Sean
> >
> >
> >>
> >> Files are compressed in package tar balls, so your IDAT files may have a
> >> considerably smaller effective size.
> >>
> >> Generally, original text files are a much better way to store external
> >> data than Rda files. For instance, rda files require updating when / if
> the
> >> class definition changes, and the provenance and content of the data is
> >> unambiguous.
> >>
> >> Experiment data packages are meant to provide reusable examples for
> >> pedagogic purposes. One would hope that minfiData fulfills this
> requirement.
> >> If not, then it would be better to continue the current discussion with
> >> Kasper and others in the community to identify an appropriately
> >> comprehensive data set for use across many relevant packages.
> >>
> >> There is no formal statement about the maximum size of experiment data
> >> packages, but one would need to make a strong argument for why a Gb of
> >> experiment data is necessary (including why existing experiment data
> >> packages are fundamentally inadequate), especially if it is to support a
> >> single package.
> >>
> >> Martin
> >>
> >>>
> >>> ---
> >>> Nicolas De Jay
> >>>
> >>> On Thu, Nov 7, 2013 at 9:38 PM, Kasper Daniel Hansen
> >>> <kasperdanielhan...@gmail.com> wrote:
> >>>>
> >>>> To give some background: it is true that the RGsetEx object (in
> >>>> data/RGsetEx.rda) is a 1-1 correspondence with the raw data files in
> >>>> inst/extdata, so one could consider it redundant.  However, having the
> >>>> IDAT
> >>>> files are convenient for testing parsing, and also for other tools who
> >>>> want
> >>>> to have 450k example data and not want to depend on minfi.  Those are
> >>>> the
> >>>> two main reasons for including the raw data as well.  And then the
> fact
> >>>> that
> >>>> while the data size is "big" it is only 6 samples.
> >>>>
> >>>> Best,
> >>>> Kasper
> >>>>
> >>>>
> >>>> On Thu, Nov 7, 2013 at 3:58 PM, Nicolas De Jay
> >>>> <nicolas.de...@mail.mcgill.ca> wrote:
> >>>>>
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> I am preparing a data package and using the minfiData package as a
> >>>>> reference.  The .idat files in extdata and the .rda file in data are
> >>>>> both present in both the compressed tarball source and the installed
> >>>>> copy directory (in my case, under ~/R/x86-64.../3.0/minfiData).
>  Isn't
> >>>>> this redundant?  Is there a way to have the prospective user only
> >>>>> download the .rda files?
> >>>>>
> >>>>> Sorry if my question is misguided and thanks in advance for your
> help.
> >>>>>
> >>>>> ---
> >>>>> Nicolas De Jay
> >>>>> M.Sc. Student
> >>>>> Department of Human Genetics
> >>>>> Montreal Children's Hospital Research Institute, McGill University
> >>>>> Health
> >>>>> Centre
> >>>>> 4060 Ste Catherine West, PT-239
> >>>>> Montreal, QC H3Z2Z3, Canada
> >>>>> T: (514) 412-4440 | E: nicolas.de...@mail.mcgill.ca
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioc-devel@r-project.org mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>>>
> >>>>
> >>>>
> >>>
> >>> _______________________________________________
> >>> Bioc-devel@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>>
> >>
> >>
> >> --
> >> Computational Biology / Fred Hutchinson Cancer Research Center
> >> 1100 Fairview Ave. N.
> >> PO Box 19024 Seattle, WA 98109
> >>
> >> Location: Arnold Building M1 B861
> >> Phone: (206) 667-2793
> >>
> >> _______________________________________________
> >> Bioc-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
> >
>
> _______________________________________________
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to