Hi Michael,
I would second your request. In a package I'll submitting soon, I have
a work-around for this by defining a set of functions like
'hsAutosomes', 'hsAllosomes' etc. that return the respective set of
human chromosome names. Perhaps on could incorporate this in the
'seqinfo' class, by additional columns similar to 'isCircular'. One
would still need an additional data source for this, since the
information about which chr is primary, autosome etc. in not contained
in a standard reference file.
We've found that analysts often need to restrict seqlevels to certain
pre-defined sets of chromsomes. Given the variability across organisms, it
would be nice to have an abstraction.
We often see this in code:
keepSeqlevels(seqinfo, as.character(1:22)
keepSeqlevels(seqinfo, c(1:22, "X", "Y"))
Perhaps instead we could the more abstract and arguably more readable:
keepAutosomes(seqinfo)
keepPrimaryChromosomes(seqinfo)
Not sure of the best term for the latter. It refers to the set of
chromosomes that are not assembly fragments but are generally in the
nucleus (when there is one).
Does the current 'sortSeqlevels' function address this? E.g.
#+BEGIN_SRC R
library(GenomicRanges)
seqinfo <- Seqinfo(paste0("chr", c(10, 1, 3)), c(10000, 1000, 3000), NA,
"mock1")
seqinfo ## 'chr10', 'chr1', 'chr3'
sortSeqlevels(seqinfo) ## now sorted 'chr1', 'chr3', 'chr10'
#+END_SRC
It would also be nice to have a sort,Seqinfo method that sorts by the
natural ordering of the chromosomes, if there is one. Maybe the function
needs its own name, but either way, this is something that really needs to
be in the infrastructure.
I think the existing SeqnameStyle infrastructure should be able to support
this.
Best wishes
Julian
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel