But what is the scope of space? For example, the reduce operation has no 
concept of space (see below). In GenomicRanges, we introduced the 
concept of seqlengths to a number of classes including GRanges and 
GRangesList. There are certain restrictions of what can be held in a 
seqlengths slot, for example you can't mix NAs with non-NAs. Perhaps we 
can formalize space for all List objects so that you either have names 
of NULL or all the names must be distinct, non-empty strings. We would 
also have to define what happens in a binary operation involving two 
List objects when name sets are not identical.


 > RangesList(a = IRanges(1,1), a = IRanges(1,2))
SimpleRangesList of length 2
$a
IRanges of length 1
     start end width
[1]     1   1     1

$a
IRanges of length 1
     start end width
[1]     1   2     2

 > validObject(RangesList(a = IRanges(1,1), a = IRanges(1,2)))
[1] TRUE
 > reduce(RangesList(a = IRanges(1,1), a = IRanges(1,2)))
SimpleRangesList of length 2
$a
IRanges of length 1
     start end width
[1]     1   1     1

$a
IRanges of length 1
     start end width
[1]     1   2     2



Patrick



On 6/12/10 5:47 AM, Michael Lawrence wrote:
>
>
> On Sat, Jun 12, 2010 at 12:17 AM, Patrick Aboyoun <[email protected] 
> <mailto:[email protected]>> wrote:
>
>     Janet,
>     Most function in the IRanges package follows the R convention of
>     considering the elements of names to be loosely linked attributes
>     rather than rigid keys. For convenience, functions such as $, [,
>     [[ treat a list as a hash if it has names, but in most
>     circumstances the names are ignored or copied without use. Even
>     when there are names on elements, there are some odd corner cases
>     that can cause problems. For example, if I wanted to have multiple
>     list elements with the same name, then some important operations
>     give unexpected results:
>
>     > list(a = 1, a = 2)["a"]
>     $a
>     [1] 1
>
>     If the issue is limited to enhance the seqselect function to make
>     it name aware, it probably makes sense to go ahead with the
>     enhancement. But the scope of this issue can grow quite large. For
>     example, should names be used when adding to RleList objects? What
>     should the following produce
>
>     RleList(a = Rle(1)) + RleList(a = Rle(2), a = Rle(3), b = Rle(4))
>
>     Due to these types of ambiguities, I would rather focus on
>     educating the user to be mindful that these are position-oriented
>     rather than key-oriented objects and have them ensure that
>     elements are in alignment.
>
>     Thoughts?
>
>
>
> Sometimes in IRanges the names have a special semantic -- that of a 
> "space". I guess this is limited to RangesList. Other data structures, 
> like RleList, are often treated as being separated by space or 
> chromosome, though their names have never explicitly been treated as 
> the space. This inconsistency is probably OK, but it needs to be 
> documented.
>
>     Patrick
>
>
>
>
>     On 6/11/10 4:06 PM, Janet Young wrote:
>
>         Hi,
>
>         I've been playing around with seqselect on scores stored in a
>         SimpleRleList object to get subregions defined in a RangesList
>         object.
>
>         I found a couple of things:  first an enhancement request -
>         would it be possible to allow seqselect to deal with cases
>         where not every space (name) in the SimpleRleList has a
>         corresponding space/name in the RangesList object?
>
>         The second is either bug or else I've misunderstood the way
>         seqselect is supposed to work, in a dangerous way - it looks
>         like seqselect doesn't use the names of the list items to
>         select scores, it just assumes that in the two lists the
>         elements have the same names in the same order.
>
>         The code below should explain both issues problem much better
>         than those descriptions.
>
>         thanks,
>
>         Janet
>
>
>
>         > library(IRanges)
>
>         Attaching package: 'IRanges'
>
>         The following object(s) are masked from 'package:base':
>
>            cbind, Map, mapply, order, paste, pmax, pmax.int
>         <http://pmax.int>, pmin, pmin.int <http://pmin.int>, rbind,
>         rep.int <http://rep.int>, table
>
>         >
>         > ### generate some arbitrary scores
>         > track <- RangedData(RangesList(chrA = IRanges(start = c(1,
>         4, 6), width=c(3, 2, 4)),chrB = IRanges(start = c(1, 3, 6),
>         width=c(3, 3, 4))) )
>         > trackCoverage <- coverage(track,
>         weight=list(chrA=c(2,7,3),chrB=c(1,1,1)) )
>         >
>         > ### define subregions
>         > exons <- RangesList(chrA = IRanges(start = c(2, 4), width =
>         c(2,2)),chrB = IRanges(start = 3, width = 5))
>         >
>         > ### seqselect works if all spaces in trackCoverage have an
>         element in exons
>         > seqselect(trackCoverage,exons )
>         SimpleRleList of length 2
>         $chrA
>         'integer' Rle of length 4 with 2 runs
>          Lengths: 2 2
>          Values : 2 7
>
>         $chrB
>         'integer' Rle of length 5 with 2 runs
>          Lengths: 1 4
>          Values : 2 1
>
>         >
>         > ### define subregions only on one chr
>         > exons_chrAonly <- RangesList(chrA = IRanges(start = c(2, 4),
>         width = c(2, 2)))
>         > ### now seqselect doesn't work if some spaces don't have any
>         elements
>         > seqselect(trackCoverage,exons_chrAonly )
>         Error in seqselect(trackCoverage, exons_chrAonly) :
>          'length(start)' must equal 'length(x)' when 'end' and 'width'
>         are NULL
>         >
>         >
>         > ##### also, defining the regions with spaces in a different
>         order seems to cause trouble as seqselect doesn't seem to be
>         using the list's names - just going by order of elements
>         > exons_reorderchrs <- RangesList(chrB = IRanges(start = 3,
>         width = 5),chrA = IRanges(start = c(2, 4), width = c(2,2)))
>         > seqselect(trackCoverage,exons_reorderchrs )
>         SimpleRleList of length 2
>         $chrA
>         'integer' Rle of length 5 with 3 runs
>          Lengths: 1 2 2
>          Values : 2 7 3
>
>         $chrB
>         'integer' Rle of length 4 with 3 runs
>          Lengths: 1 1 2
>          Values : 1 2 1
>
>         >
>         > identical ( seqselect(trackCoverage,exons ) ,
>         seqselect(trackCoverage,exons_reorderchrs )  )
>         [1] FALSE
>         >
>         > sessionInfo()
>         R version 2.11.1 (2010-05-31)
>         i386-apple-darwin9.8.0
>
>         locale:
>         [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
>         attached base packages:
>         [1] stats     graphics  grDevices utils     datasets  methods
>           base
>
>         other attached packages:
>         [1] IRanges_1.6.6
>
>         _______________________________________________
>         Bioc-sig-sequencing mailing list
>         [email protected]
>         <mailto:[email protected]>
>         https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>
>     _______________________________________________
>     Bioc-sig-sequencing mailing list
>     [email protected]
>     <mailto:[email protected]>
>     https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>


        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to