Re: [Bioc-devel] do SummarizedExperiments really need colnames?

Morgan, Martin Sat, 05 Dec 2015 10:36:07 -0800

The philosophy motivating the check is that names make the relationship between 
samples and data explicit, rather than relying on fragile positional 
information. With this in mind, I wonder why your upstream work flow does not 
include dimnames on the matrix?


That said, the check was introduced in

------------------------------------------------------------------------
r68053 | [email protected] | 2012-07-27 03:35:55 -0400 (Fri, 27 Jul 2012) | 2 
lines

SummarizedExperiment uses rowData=GRangesList() as defult

------------------------------------------------------------------------

To the observations you mention below one could also add that the rownames() 
can be NULL, so there is an uncomfortable asymmetry.

I could (1) remove the check (but use the DataFrame() constructor in an 
admittedly hackish way, not wanting to rely on the internal new() function). I 
could also (2) construct row / column names as seq_len(nrow()) / 
seq_len(ncol()).

Or (3) the code could be tightened to more closely adhere to the philosophy 
above (for instance, I think duplication of columns implied by se[,2] = se[,1] 
is worth stop()ing over, and allowing colnames(se) = NULL only enables bad 
practice). Likely this would be disruptive.

For what it's worth, we have

> library(Biobase)
> eset = ExpressionSet(matrix(0, 1, 2))
> dimnames(eset)
[[1]]
[1] "1"

[[2]]
[1] "1" "2"
> colnames(eset) = NULL
Error in `sampleNames<-`(`*tmp*`, value = NULL) : 
  'value' length (0) must equal sample number in AssayData (2)

so dimnames are being imposed.

(2) would be my current compromise preference.

Martin
________________________________________
From: Bioc-devel [[email protected]] on behalf of Aaron Lun 
[[email protected]]
Sent: Saturday, December 05, 2015 7:36 AM
To: bioc-devel
Subject: Re: [Bioc-devel] do SummarizedExperiments really need colnames?

Hello all,

At the start of the SummarizedExperiment constructor, there's a code
block that throws an error if 'colData' is not specified and the assay
matrices don't have column names.

Is this really necessary? In many cases, I just want to get a matrix
into the SE0 object without having to worry about column names. It
doesn't seem like there's a requirement for this in the SE0 class,
either; it seems happy with 'colnames(se0) <- NULL', and setting
'colData' to a 'DataFrame' with 'NULL' row names doesn't break the
constructor.

The requirement for column names causes issues for some manipulations -
for example:

out <- SummarizedExperiment(matrix(0, 10, 5),
colData=DataFrame(row.names=1:5))
out[,1] <- out[,2]

## Error in `rownames<-`(`*tmp*`, value = c("2", "2", "3", "4", "5")) :
##  duplicate rownames not allowed

While this is fair enough, it's a bit annoying if I didn't want or need
the names in the first place.

The error mentioned above precedes the construction of the missing
'colData', so if column names are missing, then a more general way to
construct the 'colData' would to do 'new("DataFrame", nrows=ncol(assays))'.

Cheers,

Aaron

_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] do SummarizedExperiments really need colnames?

Reply via email to