Thank you both - sounds like this touches on a lot of different functions. I've used findOverlaps quite a bit, so I had assumed other functions would work similarly, using the space names as indices. It would be good to make it really obvious to the user that isn't the case.

From my naive user's point of view, it would be really useful to be able to use the chromosome names to select portions of a bunch of Rles (without requiring parallel space naming), in an analogous way to using findOverlaps on RangedData objects. Perhaps a switch on seqselect (usenames=TRUE)? Or introduce another function?

Or, if you go with Patrick's idea below about requiring space names either to be all NULL or all distinct and non-empty, then (by analogy with findOverlaps), you can use space names if they're present, and if they're not present go by list position. As for your question about what do when name sets are not identical, I haven't looked at how you solved that for findOverlaps but perhaps something analagous could work for seqselect.

I guess I can do what I need to do by redefining my range information to use ordered factors as space names but it'd be really nice not to have to take those extra steps - it seems a lot more complicated than it needs to be. I think that'll work, because the way I was using seqselect on my real data was to define my ranges as RangedData objects, and then pass that to seqselect using ranges(my_rangeddata_object).

I can imagine a lot of cases where the user might want to select scores from the whole genome on just a subset of ranges that don't include one on every chromosome (and where chromosomes might not be sorted in the same way as the genome). Am I thinking about this the wrong way? Maybe there are better ways to represent the scores than SimpleRleList that would allow this more easily.

I don't know enough about the inner workings of IRanges to push strongly for any particular solution, but from the biologist's point of view here are a couple of questions to stimulate discussion. Why allow names to be specified at all if they're not meaningful? What kind of situation would a user be trying to represent with the "RangesList(a = IRanges(1,1), a = IRanges(1,2))" example? Could this situation simply be disallowed - would that mess up any real examples?

thanks again,

Janet




On Jun 12, 2010, at 6:04 AM, Patrick Aboyoun wrote:

But what is the scope of space? For example, the reduce operation has no concept of space (see below). In GenomicRanges, we introduced the concept of seqlengths to a number of classes including GRanges and GRangesList. There are certain restrictions of what can be held in a seqlengths slot, for example you can't mix NAs with non-NAs. Perhaps we can formalize space for all List objects so that you either have names of NULL or all the names must be distinct, non- empty strings. We would also have to define what happens in a binary operation involving two List objects when name sets are not identical.


> RangesList(a = IRanges(1,1), a = IRanges(1,2))
SimpleRangesList of length 2
$a
IRanges of length 1
    start end width
[1]     1   1     1

$a
IRanges of length 1
    start end width
[1]     1   2     2

> validObject(RangesList(a = IRanges(1,1), a = IRanges(1,2)))
[1] TRUE
> reduce(RangesList(a = IRanges(1,1), a = IRanges(1,2)))
SimpleRangesList of length 2
$a
IRanges of length 1
    start end width
[1]     1   1     1

$a
IRanges of length 1
    start end width
[1]     1   2     2



Patrick



On 6/12/10 5:47 AM, Michael Lawrence wrote:



On Sat, Jun 12, 2010 at 12:17 AM, Patrick Aboyoun <[email protected]> wrote:
Janet,
Most function in the IRanges package follows the R convention of considering the elements of names to be loosely linked attributes rather than rigid keys. For convenience, functions such as $, [, [[ treat a list as a hash if it has names, but in most circumstances the names are ignored or copied without use. Even when there are names on elements, there are some odd corner cases that can cause problems. For example, if I wanted to have multiple list elements with the same name, then some important operations give unexpected results:

> list(a = 1, a = 2)["a"]
$a
[1] 1

If the issue is limited to enhance the seqselect function to make it name aware, it probably makes sense to go ahead with the enhancement. But the scope of this issue can grow quite large. For example, should names be used when adding to RleList objects? What should the following produce

RleList(a = Rle(1)) + RleList(a = Rle(2), a = Rle(3), b = Rle(4))

Due to these types of ambiguities, I would rather focus on educating the user to be mindful that these are position-oriented rather than key-oriented objects and have them ensure that elements are in alignment.

Thoughts?



Sometimes in IRanges the names have a special semantic -- that of a "space". I guess this is limited to RangesList. Other data structures, like RleList, are often treated as being separated by space or chromosome, though their names have never explicitly been treated as the space. This inconsistency is probably OK, but it needs to be documented.

Patrick




On 6/11/10 4:06 PM, Janet Young wrote:
Hi,

I've been playing around with seqselect on scores stored in a SimpleRleList object to get subregions defined in a RangesList object.

I found a couple of things: first an enhancement request - would it be possible to allow seqselect to deal with cases where not every space (name) in the SimpleRleList has a corresponding space/ name in the RangesList object?

The second is either bug or else I've misunderstood the way seqselect is supposed to work, in a dangerous way - it looks like seqselect doesn't use the names of the list items to select scores, it just assumes that in the two lists the elements have the same names in the same order.

The code below should explain both issues problem much better than those descriptions.

thanks,

Janet



> library(IRanges)

Attaching package: 'IRanges'

The following object(s) are masked from 'package:base':

cbind, Map, mapply, order, paste, pmax, pmax.int, pmin, pmin.int, rbind, rep.int, table

>
> ### generate some arbitrary scores
> track <- RangedData(RangesList(chrA = IRanges(start = c(1, 4, 6), width=c(3, 2, 4)),chrB = IRanges(start = c(1, 3, 6), width=c(3, 3, 4))) ) > trackCoverage <- coverage(track, weight=list(chrA=c(2,7,3),chrB=c(1,1,1)) )
>
> ### define subregions
> exons <- RangesList(chrA = IRanges(start = c(2, 4), width = c(2,2)),chrB = IRanges(start = 3, width = 5))
>
> ### seqselect works if all spaces in trackCoverage have an element in exons
> seqselect(trackCoverage,exons )
SimpleRleList of length 2
$chrA
'integer' Rle of length 4 with 2 runs
 Lengths: 2 2
 Values : 2 7

$chrB
'integer' Rle of length 5 with 2 runs
 Lengths: 1 4
 Values : 2 1

>
> ### define subregions only on one chr
> exons_chrAonly <- RangesList(chrA = IRanges(start = c(2, 4), width = c(2, 2))) > ### now seqselect doesn't work if some spaces don't have any elements
> seqselect(trackCoverage,exons_chrAonly )
Error in seqselect(trackCoverage, exons_chrAonly) :
'length(start)' must equal 'length(x)' when 'end' and 'width' are NULL
>
>
> ##### also, defining the regions with spaces in a different order seems to cause trouble as seqselect doesn't seem to be using the list's names - just going by order of elements > exons_reorderchrs <- RangesList(chrB = IRanges(start = 3, width = 5),chrA = IRanges(start = c(2, 4), width = c(2,2)))
> seqselect(trackCoverage,exons_reorderchrs )
SimpleRleList of length 2
$chrA
'integer' Rle of length 5 with 3 runs
 Lengths: 1 2 2
 Values : 2 7 3

$chrB
'integer' Rle of length 4 with 3 runs
 Lengths: 1 1 2
 Values : 1 2 1

>
> identical ( seqselect(trackCoverage,exons ) , seqselect(trackCoverage,exons_reorderchrs ) )
[1] FALSE
>
> sessionInfo()
R version 2.11.1 (2010-05-31)
i386-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] IRanges_1.6.6

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to