On Fri, Jun 4, 2010 at 11:51 PM, Patrick Aboyoun <[email protected]> wrote:

> Great thread on the subset function. It currently has to IRanges-based
> methods:
>
> > showMethods("subset")
> Function: subset (package base)
> x="ANY"
> x="DataTable"
> x="Sequence"
>
> Based on what was being discussed, I see two enhancement requests:
>
> 1) Expanding the scope of subset to allow reference to components of
> non-DataTable objects such as IRanges and GRanges instances:
>
> ## Currently not supported, but could be
> ir <- IRanges(start = 1:10, end = 1:10)
> subset(ir, start < 5)
>
> 2) Add support for subsetting by 'logical' Rle in the subset function.
>
>
This relates to an important question about the treatment of Rle's. Right
now, we wrap several functions that operate on environments (like xtabs) by
converting objects to environments. Unfortunately, when our environment
contains Rle's, many of these functions, including xtabs, break. This
litters my code with as.vector() calls. Perhaps in these cases as.env()
could be told to call as.vector automatically. Wasteful, but easier than
reimplementing the functions from scratch (though subset is relatively
simple, so was reimplemented for DataTable).

More tangentially related to the subset question is whether it is feasible
to use Rle's as split factors. This comes up with functions like split,
tapply and by. For interactions, they require lists, and tapply always
requires a list, which makes it impossible to specifically dispatch on
Rle's. Dispatching on "ANY", "list" would override every call to tapply but
is probably the simplest solution. I think the existing "Sequence" method
would work fine for tapply.

What do you think Patrick?

Michael

The second request is straight-forward to implement since it can be done
> within the subset methods of the Sequence and DataTable virtual classes. If
> we limit the first to Ranges (virtual class) and GRanges (which doesn't
> inherit from Ranges) objects, then two more subset methods would suffice to
> achieve 1). Sound reasonable?
>
>
> Patrick
>
>
>
> On 6/4/10 10:06 PM, Steve Lianoglou wrote:
>
>> Hi Vincent,
>>
>>
>>
>>> the simplification that Steve
>>> seems to be asking for would
>>> allow implicit references to elementMetadata variables in the predicate.
>>>  I
>>> am not in favor of such
>>> an extension of semantics of bracket.
>>>
>>>
>> Just to be clear, I'm not suggesting referencing elementMetadata
>> variables implicitly w/in brackets, but rather only when using
>> `subset` (as `subset` does now with columns of a data.frame (when it's
>> used *on* a data.frame))
>>
>> So, using your example gr object:
>>
>> GRanges with 10 ranges and 2 elementMetadata values
>>  seqnames    ranges strand |     score        GC
>>     <Rle>  <IRanges>   <Rle>  |<integer>  <numeric>
>> a   Chrom1  [ 1, 10]      - |         1 1.0000000
>> b   Chrom2  [ 2, 10]      + |         2 0.8888889
>> c   Chrom2  [ 3, 10]      + |         3 0.7777778
>> d   Chrom2  [ 4, 10]      * |         4 0.6666667
>> e   Chrom1  [ 5, 10]      * |         5 0.5555556
>> f   Chrom1  [ 6, 10]      + |         6 0.4444444
>> g   Chrom3  [ 7, 10]      + |         7 0.3333333
>> h   Chrom3  [ 8, 10]      + |         8 0.2222222
>> i   Chrom3  [ 9, 10]      - |         9 0.1111111
>> j   Chrom3  [10, 10]      - |        10 0.0000000
>>
>> seqlengths
>>  Chrom1 Chrom2 Chrom3
>>     NA     NA     NA
>>
>> I was curious if this would be useful:
>>
>> R>  subset(gr, strand == "+"&  score>  6)
>>
>> but I wasn't trying to propose having something like this:
>>
>> R>  gr[strand == "+"&  score>  6]
>>
>>
>>
>
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to