There is a lot of meat here that I can't properly address now because I am heading out to serve as a BioC evangelist in Europe. I was looking over the as.env methods that you created Michael and I agree it would be useful if we expanded upon this to support Rle's. I probably wont be able to do much work on this until late June, but Michael feel free to rework this as you see fit.

Cheers,
Patrick


On 6/5/10 9:18 AM, Charles C. Berry wrote:
On Fri, 4 Jun 2010, Patrick Aboyoun wrote:

Great thread on the subset function. It currently has to IRanges-based methods:

 showMethods("subset")
Function: subset (package base)
x="ANY"
x="DataTable"
x="Sequence"

Based on what was being discussed, I see two enhancement requests:

1) Expanding the scope of subset to allow reference to components of non-DataTable objects such as IRanges and GRanges instances:

## Currently not supported, but could be
ir <- IRanges(start = 1:10, end = 1:10)
subset(ir, start < 5)

2) Add support for subsetting by 'logical' Rle in the subset function.

The second request is straight-forward to implement since it can be done within the subset methods of the Sequence and DataTable virtual classes. If we limit the first to Ranges (virtual class) and GRanges (which doesn't inherit from Ranges) objects, then two more subset methods would suffice to achieve 1). Sound reasonable?


Patrick

Perhaps this request pertaining to xtabs(..., subset = ...) is related.

Currently (rather, in IRanges_1.6.4)

library(IRanges)
ir <- RangedData(IRanges(start=1:10,width=1),space=rep(letters[1:2],5),z=rep(1:3,length=10))
xtabs(~z,as.data.frame(ir),subset = z > 1)
z
2 3
3 3
xtabs(~z,subset(ir,z>1))
z
2 3
3 3

xtabs(~z,ir,subset = z > 1)
Error in xj[i] : invalid subscript type 'closure'

xtabs(~z,subset(ir,space=='a'))
z
1 2 3
2 1 2
xtabs(~z,ir,subset = space=='a')
Error in xj[i] : invalid subscript type 'closure'


Can this be changed to allow use of the subset argument when the data arg is a RangedData (or GRanges) instance?

Thanks,

Chuck



On 6/4/10 10:06 PM, Steve Lianoglou wrote:
 Hi Vincent,


>  the simplification that Steve
>  seems to be asking for would
> allow implicit references to elementMetadata variables in the predicate. > I
>  am not in favor of such
>  an extension of semantics of bracket.
>
 Just to be clear, I'm not suggesting referencing elementMetadata
 variables implicitly w/in brackets, but rather only when using
 `subset` (as `subset` does now with columns of a data.frame (when it's
 used *on* a data.frame))

 So, using your example gr object:

 GRanges with 10 ranges and 2 elementMetadata values
   seqnames    ranges strand |     score        GC
<Rle> <IRanges> <Rle>  |<integer> <numeric>
 a   Chrom1  [ 1, 10]      - |         1 1.0000000
 b   Chrom2  [ 2, 10]      + |         2 0.8888889
 c   Chrom2  [ 3, 10]      + |         3 0.7777778
 d   Chrom2  [ 4, 10]      * |         4 0.6666667
 e   Chrom1  [ 5, 10]      * |         5 0.5555556
 f   Chrom1  [ 6, 10]      + |         6 0.4444444
 g   Chrom3  [ 7, 10]      + |         7 0.3333333
 h   Chrom3  [ 8, 10]      + |         8 0.2222222
 i   Chrom3  [ 9, 10]      - |         9 0.1111111
 j   Chrom3  [10, 10]      - |        10 0.0000000

 seqlengths
   Chrom1 Chrom2 Chrom3
      NA     NA     NA

 I was curious if this would be useful:

R>   subset(gr, strand == "+"&  score>  6)

 but I wasn't trying to propose having something like this:

R>   gr[strand == "+"&  score>  6]



_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


Charles C. Berry                            (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:[email protected]                UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901



_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to