On Mon, Dec 16, 2013 at 5:00 AM, Julian Gehring <julian.gehr...@embl.de>wrote:

> Hi Michael,
>
> I would second your request.  In a package I'll submitting soon, I have a
> work-around for this by defining a set of functions like 'hsAutosomes',
> 'hsAllosomes' etc. that return the respective set of human chromosome
> names.  Perhaps on could incorporate this in the 'seqinfo' class, by
> additional columns similar to 'isCircular'.  One would still need an
> additional data source for this, since the information about which chr is
> primary, autosome etc. in not contained in a standard reference file.
>
>
Yes, I think it should be stored with the Seqinfo. It could be imputed
(along with the isCircular I think) via the SeqnameStyle system that stores
different naming conventions for different species. At the very least, the
SeqnameStyle could inform a utility like keepAutosomes(), whether we modify
Seqinfo or not.


>
>  We've found that analysts often need to restrict seqlevels to certain
>> pre-defined sets of chromsomes. Given the variability across organisms, it
>> would be nice to have an abstraction.
>>
>> We often see this in code:
>>
>> keepSeqlevels(seqinfo, as.character(1:22)
>> keepSeqlevels(seqinfo, c(1:22, "X", "Y"))
>>
>> Perhaps instead we could the more abstract and arguably more readable:
>>
>> keepAutosomes(seqinfo)
>> keepPrimaryChromosomes(seqinfo)
>>
>> Not sure of the best term for the latter. It refers to the set of
>> chromosomes that are not assembly fragments but are generally in the
>> nucleus (when there is one).
>>
>
>
> Does the current 'sortSeqlevels' function address this? E.g.
>
> #+BEGIN_SRC R
>
> library(GenomicRanges)
> seqinfo <- Seqinfo(paste0("chr", c(10, 1, 3)), c(10000, 1000, 3000), NA,
> "mock1")
> seqinfo  ## 'chr10', 'chr1', 'chr3'
> sortSeqlevels(seqinfo) ## now sorted 'chr1', 'chr3', 'chr10'
>
> #+END_SRC
>
>

Thanks, I was not aware of this one. That should do the trick.


>
>  It would also be nice to have a sort,Seqinfo method that sorts by the
>> natural ordering of the chromosomes, if there is one. Maybe the function
>> needs its own name, but either way, this is something that really needs to
>> be in the infrastructure.
>>
>> I think the existing SeqnameStyle infrastructure should be able to support
>> this.
>>
>
> Best wishes
> Julian
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to