Nicolas,

Experiment data packages serve different needs, including (1) a small
dataset for testing / running examples for a specific package.  (2) a
large(r) dataset for teaching purposes and/or to showcase a real analysis
workflow.  You seem to suggest that your data package is tightly integrated
with your "analysis" package.  In general, as the package gets bigger, I
think you do have a responsibility to make it useful to other analysis
packages, so that several analysis packages can plug into it.

In this specific case, I would make sure the IDATs are deposited in a
public database, and then I would only include rda's in the data package,
together with a scrip which retrieves and packages the IDATs into the
object in the data package.

Best,
Kasper


On Fri, Nov 8, 2013 at 2:00 PM, Sean Davis <sdav...@mail.nih.gov> wrote:

> On Fri, Nov 8, 2013 at 1:54 PM, Nicolas De Jay <
> nicolas.de...@mail.mcgill.ca
> > wrote:
>
> > In that case, I will try to see if the public databases have the kind
> > of data sets I am trying to package and run the idea by the team that
> > is assigned to the project I am developing.  Thank you Martin, Sean
> > and Kasper for your valuable insight!
> >
> >
> [EDITORIAL COMMENT]
> No matter what you decide about the experiment data package, if you have
> data that you are willing to share (and it sounds like you do), putting
> them in a public repository is a GOOD THING.
> [END EDITORIAL COMMENT]
>
> Sean
>
>
>
> > ---
> > Nicolas De Jay
> >
> > On Fri, Nov 8, 2013 at 9:07 AM, Sean Davis <sdav...@mail.nih.gov> wrote:
> > >
> > >
> > >
> > > On Fri, Nov 8, 2013 at 8:41 AM, Martin Morgan <mtmor...@fhcrc.org>
> > wrote:
> > >>
> > >> On 11/07/2013 09:26 PM, Nicolas De Jay wrote:
> > >>>
> > >>> Thanks for the prompt answer.  The data set I am packaging closely
> > >>> resembles that of minfiData except that there are 52 samples; the
> IDAT
> > >>> files together are some 800MB whereas the Rda file is closer to
> 150MB.
> > >>>   It is worth noting that my experiment data package will be
> submitted
> > >>> to Bioconductor along with a software package which makes use of
> these
> > >>> samples in the vignette.  With this in mind, can I omit the IDAT
> > >>> files?  If this goes against Bioconductor's underlying design, what
> > >>> would you say is the maximum size of an experiment data package?
> > >>
> > >>
> > >> Hi Nicolas -- Some things to bear in mind.
> > >>
> > >
> > > Hi, Nicolas.
> > >
> > > I just wanted to note that experiment data packages are meant as a
> > > convenient way to distribute data so that reproducible workflows and
> > > documentation can be created easily.  There are other options such as
> > > accessing the data directly from public repositories using Bioconductor
> > > tools that serve the same purpose.  While accessing such online
> resources
> > > does necessitate a one-time network connection (after which packages
> like
> > > GEOquery can use locally cached data), when appropriate datasets exist
> in
> > > public repositories, it may be a perfectly viable alternative to
> > experiment
> > > data packages.  In this particular case, as of today in NCBI GEO, there
> > are
> > > 1711 Human 450k samples with IDAT files available.  I am not arguing
> that
> > > this route should replace experiment data packages, just that stable
> > public
> > > data resources are an alternative to them to consider.
> > >
> > > Sean
> > >
> > >
> > >>
> > >> Files are compressed in package tar balls, so your IDAT files may
> have a
> > >> considerably smaller effective size.
> > >>
> > >> Generally, original text files are a much better way to store external
> > >> data than Rda files. For instance, rda files require updating when /
> if
> > the
> > >> class definition changes, and the provenance and content of the data
> is
> > >> unambiguous.
> > >>
> > >> Experiment data packages are meant to provide reusable examples for
> > >> pedagogic purposes. One would hope that minfiData fulfills this
> > requirement.
> > >> If not, then it would be better to continue the current discussion
> with
> > >> Kasper and others in the community to identify an appropriately
> > >> comprehensive data set for use across many relevant packages.
> > >>
> > >> There is no formal statement about the maximum size of experiment data
> > >> packages, but one would need to make a strong argument for why a Gb of
> > >> experiment data is necessary (including why existing experiment data
> > >> packages are fundamentally inadequate), especially if it is to
> support a
> > >> single package.
> > >>
> > >> Martin
> > >>
> > >>>
> > >>> ---
> > >>> Nicolas De Jay
> > >>>
> > >>> On Thu, Nov 7, 2013 at 9:38 PM, Kasper Daniel Hansen
> > >>> <kasperdanielhan...@gmail.com> wrote:
> > >>>>
> > >>>> To give some background: it is true that the RGsetEx object (in
> > >>>> data/RGsetEx.rda) is a 1-1 correspondence with the raw data files in
> > >>>> inst/extdata, so one could consider it redundant.  However, having
> the
> > >>>> IDAT
> > >>>> files are convenient for testing parsing, and also for other tools
> who
> > >>>> want
> > >>>> to have 450k example data and not want to depend on minfi.  Those
> are
> > >>>> the
> > >>>> two main reasons for including the raw data as well.  And then the
> > fact
> > >>>> that
> > >>>> while the data size is "big" it is only 6 samples.
> > >>>>
> > >>>> Best,
> > >>>> Kasper
> > >>>>
> > >>>>
> > >>>> On Thu, Nov 7, 2013 at 3:58 PM, Nicolas De Jay
> > >>>> <nicolas.de...@mail.mcgill.ca> wrote:
> > >>>>>
> > >>>>>
> > >>>>> Hi,
> > >>>>>
> > >>>>> I am preparing a data package and using the minfiData package as a
> > >>>>> reference.  The .idat files in extdata and the .rda file in data
> are
> > >>>>> both present in both the compressed tarball source and the
> installed
> > >>>>> copy directory (in my case, under ~/R/x86-64.../3.0/minfiData).
> >  Isn't
> > >>>>> this redundant?  Is there a way to have the prospective user only
> > >>>>> download the .rda files?
> > >>>>>
> > >>>>> Sorry if my question is misguided and thanks in advance for your
> > help.
> > >>>>>
> > >>>>> ---
> > >>>>> Nicolas De Jay
> > >>>>> M.Sc. Student
> > >>>>> Department of Human Genetics
> > >>>>> Montreal Children's Hospital Research Institute, McGill University
> > >>>>> Health
> > >>>>> Centre
> > >>>>> 4060 Ste Catherine West, PT-239
> > >>>>> Montreal, QC H3Z2Z3, Canada
> > >>>>> T: (514) 412-4440 | E: nicolas.de...@mail.mcgill.ca
> > >>>>>
> > >>>>> _______________________________________________
> > >>>>> Bioc-devel@r-project.org mailing list
> > >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >>>>
> > >>>>
> > >>>>
> > >>>
> > >>> _______________________________________________
> > >>> Bioc-devel@r-project.org mailing list
> > >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >>>
> > >>
> > >>
> > >> --
> > >> Computational Biology / Fred Hutchinson Cancer Research Center
> > >> 1100 Fairview Ave. N.
> > >> PO Box 19024 Seattle, WA 98109
> > >>
> > >> Location: Arnold Building M1 B861
> > >> Phone: (206) 667-2793
> > >>
> > >> _______________________________________________
> > >> Bioc-devel@r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >
> > >
> >
> > _______________________________________________
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to