Re: [Bioc-sig-seq] seqselect on SimpleRleList and RangesList - bug? and request

Janet Young Mon, 14 Jun 2010 11:38:13 -0700

Thank you both - sounds like this touches on a lot of differentfunctions. I've used findOverlaps quite a bit, so I had assumed otherfunctions would work similarly, using the space names as indices. Itwould be good to make it really obvious to the user that isn't the case.

From my naive user's point of view, it would be really useful to beable to use the chromosome names to select portions of a bunch of Rles(without requiring parallel space naming), in an analogous way tousing findOverlaps on RangedData objects. Perhaps a switch onseqselect (usenames=TRUE)? Or introduce another function?

Or, if you go with Patrick's idea below about requiring space nameseither to be all NULL or all distinct and non-empty, then (by analogywith findOverlaps), you can use space names if they're present, and ifthey're not present go by list position. As for your question aboutwhat do when name sets are not identical, I haven't looked at how yousolved that for findOverlaps but perhaps something analagous couldwork for seqselect.

I guess I can do what I need to do by redefining my range informationto use ordered factors as space names but it'd be really nice not tohave to take those extra steps - it seems a lot more complicated thanit needs to be. I think that'll work, because the way I was usingseqselect on my real data was to define my ranges as RangedDataobjects, and then pass that to seqselect usingranges(my_rangeddata_object).

I can imagine a lot of cases where the user might want to selectscores from the whole genome on just a subset of ranges that don'tinclude one on every chromosome (and where chromosomes might not besorted in the same way as the genome). Am I thinking about this thewrong way? Maybe there are better ways to represent the scores thanSimpleRleList that would allow this more easily.

I don't know enough about the inner workings of IRanges to pushstrongly for any particular solution, but from the biologist's pointof view here are a couple of questions to stimulate discussion. Whyallow names to be specified at all if they're not meaningful? Whatkind of situation would a user be trying to represent with the"RangesList(a = IRanges(1,1), a = IRanges(1,2))" example? Could thissituation simply be disallowed - would that mess up any real examples?


thanks again,

Janet




On Jun 12, 2010, at 6:04 AM, Patrick Aboyoun wrote:

But what is the scope of space? For example, the reduce operationhas no concept of space (see below). In GenomicRanges, we introducedthe concept of seqlengths to a number of classes including GRangesand GRangesList. There are certain restrictions of what can be heldin a seqlengths slot, for example you can't mix NAs with non-NAs.Perhaps we can formalize space for all List objects so that youeither have names of NULL or all the names must be distinct, non-empty strings. We would also have to define what happens in a binaryoperation involving two List objects when name sets are not identical.
> RangesList(a = IRanges(1,1), a = IRanges(1,2))
SimpleRangesList of length 2
$a
IRanges of length 1
    start end width
[1]     1   1     1

$a
IRanges of length 1
    start end width
[1]     1   2     2

> validObject(RangesList(a = IRanges(1,1), a = IRanges(1,2)))
[1] TRUE
> reduce(RangesList(a = IRanges(1,1), a = IRanges(1,2)))
SimpleRangesList of length 2
$a
IRanges of length 1
    start end width
[1]     1   1     1

$a
IRanges of length 1
    start end width
[1]     1   2     2



Patrick



On 6/12/10 5:47 AM, Michael Lawrence wrote:
On Sat, Jun 12, 2010 at 12:17 AM, Patrick Aboyoun<[email protected]> wrote:
Janet,
Most function in the IRanges package follows the R convention ofconsidering the elements of names to be loosely linked attributesrather than rigid keys. For convenience, functions such as $, [,[[ treat a list as a hash if it has names, but in mostcircumstances the names are ignored or copied without use. Evenwhen there are names on elements, there are some odd corner casesthat can cause problems. For example, if I wanted to have multiplelist elements with the same name, then some important operationsgive unexpected results:
> list(a = 1, a = 2)["a"]
$a
[1] 1
If the issue is limited to enhance the seqselect function to makeit name aware, it probably makes sense to go ahead with theenhancement. But the scope of this issue can grow quite large. Forexample, should names be used when adding to RleList objects? Whatshould the following produce
RleList(a = Rle(1)) + RleList(a = Rle(2), a = Rle(3), b = Rle(4))
Due to these types of ambiguities, I would rather focus oneducating the user to be mindful that these are position-orientedrather than key-oriented objects and have them ensure that elementsare in alignment.
Thoughts?
Sometimes in IRanges the names have a special semantic -- that of a"space". I guess this is limited to RangesList. Other datastructures, like RleList, are often treated as being separated byspace or chromosome, though their names have never explicitly beentreated as the space. This inconsistency is probably OK, but itneeds to be documented.
Patrick




On 6/11/10 4:06 PM, Janet Young wrote:
Hi,
I've been playing around with seqselect on scores stored in aSimpleRleList object to get subregions defined in a RangesListobject.
I found a couple of things: first an enhancement request - wouldit be possible to allow seqselect to deal with cases where notevery space (name) in the SimpleRleList has a corresponding space/name in the RangesList object?
The second is either bug or else I've misunderstood the wayseqselect is supposed to work, in a dangerous way - it looks likeseqselect doesn't use the names of the list items to select scores,it just assumes that in the two lists the elements have the samenames in the same order.
The code below should explain both issues problem much better thanthose descriptions.
thanks,

Janet



> library(IRanges)

Attaching package: 'IRanges'

The following object(s) are masked from 'package:base':
cbind, Map, mapply, order, paste, pmax, pmax.int, pmin,pmin.int, rbind, rep.int, table
>
> ### generate some arbitrary scores
> track <- RangedData(RangesList(chrA = IRanges(start = c(1, 4, 6),width=c(3, 2, 4)),chrB = IRanges(start = c(1, 3, 6), width=c(3, 3,4))) )> trackCoverage <- coverage(track,weight=list(chrA=c(2,7,3),chrB=c(1,1,1)) )
>
> ### define subregions
> exons <- RangesList(chrA = IRanges(start = c(2, 4), width =c(2,2)),chrB = IRanges(start = 3, width = 5))
>
> ### seqselect works if all spaces in trackCoverage have anelement in exons
> seqselect(trackCoverage,exons )
SimpleRleList of length 2
$chrA
'integer' Rle of length 4 with 2 runs
 Lengths: 2 2
 Values : 2 7

$chrB
'integer' Rle of length 5 with 2 runs
 Lengths: 1 4
 Values : 2 1

>
> ### define subregions only on one chr
> exons_chrAonly <- RangesList(chrA = IRanges(start = c(2, 4),width = c(2, 2)))> ### now seqselect doesn't work if some spaces don't have anyelements
> seqselect(trackCoverage,exons_chrAonly )
Error in seqselect(trackCoverage, exons_chrAonly) :
'length(start)' must equal 'length(x)' when 'end' and 'width' areNULL
>
>
> ##### also, defining the regions with spaces in a different orderseems to cause trouble as seqselect doesn't seem to be using thelist's names - just going by order of elements> exons_reorderchrs <- RangesList(chrB = IRanges(start = 3, width =5),chrA = IRanges(start = c(2, 4), width = c(2,2)))
> seqselect(trackCoverage,exons_reorderchrs )
SimpleRleList of length 2
$chrA
'integer' Rle of length 5 with 3 runs
 Lengths: 1 2 2
 Values : 2 7 3

$chrB
'integer' Rle of length 4 with 3 runs
 Lengths: 1 1 2
 Values : 1 2 1

>
> identical ( seqselect(trackCoverage,exons ) ,seqselect(trackCoverage,exons_reorderchrs ) )
[1] FALSE
>
> sessionInfo()
R version 2.11.1 (2010-05-31)
i386-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] IRanges_1.6.6

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] seqselect on SimpleRleList and RangesList - bug? and request

Reply via email to