On 08/08/2017 03:59 AM, Angerer, Philipp wrote:



Hi Aaron,



I guess this would be a question for the SummarizedExperiment developers, 
though personally, I never liked ExpressionSet's inclination to slap names on 
everything.



Too bad we’re bound to SummarizedExperiment’s “rows” and “cols”. Since they 
always refer to features and samples, respectively: Why not name them that?
Language is a funny thing. In the ExpressionSet world, 'features' were actually a misnomer, since they refer to spots (probes) on the microarray, rather than summarized expression values of individual genes. Rectangular data from other assays might well label the observations made on each sample as something different from 'feature'. Likewise, the columns were called 'phenoData', describing the phenotypes of the samples, but phenotype has a different meanings in different disciplines (hey wait, my experiment used two genetically different mice, we're talking about _geno_types, not phenotypes!). And of course 'sample' has statistical meanings that only sometimes applies.

In the end it seemed better to use generic terms for a data class meant for general use.

Martin


There’s already too many APIs in too many programming languages that 
confusingly have one or the other convention – if whe know which is which, why 
not name them after that knowledge?
BQ_BEGIN


It probably wouldn't be a good idea to store distances as expression matrices. 
However, if there is a need for it, we can add a new slot for distance 
matrices. I think SC3 has a similar requirement, so perhaps this would be more 
generally useful than I first thought. You can post an issue on the github 
repository to remind Davide or me to do it.
BQ_END


Distance matrices (cell×cell) can’t only come from cell×gene matrices. You can 
e.g. use dynamic time warping to create them from cell×gene×time arrays.
BQ_BEGIN


Finally, I'm not sure what advantages those ergonomics provide. Indeed, if 
every package defines its own plot() S4 method for SingleCellExperiment, they 
will clobber each other in the dispatch table, resulting in some interesting 
results dependent on package loading order. If you have destiny-specific data 
and methods, best to keep them separate rather than stuffing them into the SCE 
object.
BQ_END


I wrote that I could e.g. create a plot_dm method, which plots a diffusion map 
stored in a SCE.

Also I didn’t mean the plot method with ergonomics. I meant fortify , names , $ 
, and [[ . Those would be very useful, as you could just do things like the 
following, and have autocompletion:
sce$Predicate1 <- sce$SampleMeta1 > 40 # `$` accesses counts (by gene) and 
rowData. `$<-` sets rowData
qplot(Gene1, Gene2, colour = Predicate1, data = sce) # fortify creates a 
data.frame containing cbind(t(counts), rowData)


Just as you can do now with DiffusionMap objects.

Also I’m not sure if i got rowData and the “t” right in the above code ;) I 
meant cbind(counts as cell×gene, sampleMeta as cell×n_meta)

Best,
Phil


Helmholtz Zentrum Muenchen

Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)

Ingolstaedter Landstr. 1

85764 Neuherberg

www.helmholtz-muenchen.de

Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe

Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen

Registergericht: Amtsgericht Muenchen HRB 6466

USt-IdNr: DE 129521671


        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



This email message may contain legally privileged and/or...{{dropped:2}}

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to