Re: [Bioc-devel] Multiple colData in SummarizedExperiment

Ryan Wed, 17 Jun 2015 21:27:12 -0700

Oh wow, I didn't know you could put a DataFrame into a single column ofanother DataFrame. That actually solves a problem for me too (I don'tintend to expose nested DataFrames to the users though).


On 6/17/15 7:23 PM, Martin Morgan wrote:

On 06/17/2015 11:41 AM, davide risso wrote:
Dear list,
I'm creating an R package to store RNA-seq data of a somewhat largeproject
in which I'm involved.
One of the initial goals is to compare different pre-processingpipelines,hence I have multiple expression matrices corresponding to the samesamples.
The SummarizedExperiment class seems a good candidate, since I have
multiple expression matrices with the same rowData and colDatainformation.
I have several sample-specific variables that I want to store with the
object, namely, experimental information (e.g., batch, date,experimental
condition, ...) and sample quality (e.g., proportion of aligned reads,
total duplicate reads, etc...).

Of course, I can always create one big data frame concatenating the two
(experimental info + sample quality), but it seems that bothconceptually
and practically, it might be useful to have two separate data frames.
Since this seems somewhat a reasonably standard type of information that
one would want to carry on, I was wondering if it would be possible /
useful to allow the user to have multiple data.frames in the colDataslot
Actually, colData() is a DataFrame, and a DataFrame column can containa DataFrame. So after
  example(SummarizedExperiment)

we could make some faux sample quality data

  quality = DataFrame(x=1:6, y=6:1, row.names=colnames(se1))

add this as a column in the colData()

  colData(se1)$quality = quality
(or create the SummarizedExperiment from a similar DataFrame up-front)and manage our grouped data
> colData(se1)
DataFrame with 6 rows and 2 columns
    Treatment     quality
  <character> <DataFrame>
A        ChIP    ########
B       Input    ########
C        ChIP    ########
D       Input    ########
E        ChIP    ########
F       Input    ########
> colData(se1[,1:2])$quality
DataFrame with 2 rows and 2 columns
          x         y
  <integer> <integer>
A         1         6
B         2         5
I'm not sure that this is any less confusing to the end user thanhaving to manage a DataFrameList(), but it does not require any newfeatures.
Martin
of SummarizedExperiment.

Best,
Davide

    [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Multiple colData in SummarizedExperiment

Reply via email to