Re: [Bioc-sig-seq] GenomicFeatures, error in type conversion RangeData to GRanges

Michael Lawrence Thu, 01 Apr 2010 13:33:17 -0700

I think this is still too pedantic. For example, the GRanges constructor
defaults to '*'. That should also emit a warning to be consistent with this.



On Thu, Apr 1, 2010 at 12:01 PM, Patrick Aboyoun <[email protected]> wrote:

> I just checked in a patch to the GenomicRanges package in which the GRanges
> constructor will now convert NA values in strand to the both/either strand
> indicator "*" and issue a warning to the end-user that informs them of the
> change. The updated GenomicRanges package should be available from
> bioconductor.org with the next 36 hours. Here is an example:
>
>
> > RangedData(IRanges(1,2))
> RangedData with 1 row and 0 value columns across 1 space
>        space    ranges |
> <character> <IRanges> |
> 1           1    [1, 2] |
>
> > as(RangedData(IRanges(1,2)), "GRanges")
>
> GRanges with 1 range and 0 elementMetadata values
>    seqnames    ranges strand |
> <Rle> <IRanges> <Rle> |
> [1]        1    [1, 2]      * |
>
> seqlengths
>  1
> NA
> Warning message:
> In GRanges(seqnames = space(from), ranges = ranges, strand =
> Rle(strand(from)),  :
>  missing values in strand converted to "*"
>
> > sessionInfo()
> R version 2.11.0 Under development (unstable) (2010-03-22 r51355)
> i386-apple-darwin9.8.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] GenomicRanges_0.1.3 IRanges_1.5.74
>
>
>
>
>
> On 4/1/10 8:04 AM, Michael Lawrence wrote:
>
>> Thinking about this some more, it's somewhat analogous to the coercion to
>> factor in R, i.e. as.factor(c("male", "female")) returns something
>> reasonable, despite missing level information.
>>
>> as.factor("male") would probably not be what I wanted, but we live with
>> it,
>> since the alternative (requiring the levels argument) would probably be
>> worse.
>>
>> On Thu, Apr 1, 2010 at 7:31 AM, Michael Lawrence<[email protected]>
>>  wrote:
>>
>>
>>
>>>
>>> On Thu, Apr 1, 2010 at 7:22 AM, Martin Morgan<[email protected]>
>>>  wrote:
>>>
>>>
>>>
>>>> On 04/01/2010 07:12 AM, Michael Lawrence wrote:
>>>>
>>>>
>>>>> On Thu, Apr 1, 2010 at 7:09 AM, Martin Morgan<[email protected]>
>>>>>
>>>>>
>>>> wrote:
>>>>
>>>>
>>>>>
>>>>>
>>>>>> On 03/31/2010 07:11 PM, [email protected] wrote:
>>>>>>
>>>>>>
>>>>>>>  Dear bioc-sig-sequencing,
>>>>>>>
>>>>>>> I would like to annotate chip-seq peaks for the arabidopsis genome.
>>>>>>>
>>>>>>>
>>>>>>  In
>>>>
>>>>
>>>>> trying to work thru the GenomicFeatures vignette dated 03/27/10, I need
>>>>>>
>>>>>>
>>>>> to
>>>>
>>>>
>>>>> convert my ChIPSeq peaks from a RangedData object to a GRanges object.
>>>>>>
>>>>>>
>>>>>  In a
>>>>
>>>>
>>>>> recent, but previous Bioconductor development version, the conversion
>>>>>>
>>>>>>
>>>>> with
>>>>
>>>>
>>>>> this particular RangedData object worked fine.
>>>>>>
>>>>>>
>>>>>>> In this more recent Bioconductor development version, I get the
>>>>>>>
>>>>>>>
>>>>>> following
>>>>
>>>>
>>>>> error message:
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> gr_ChSeqPks<- as(rd0_chr1_s_8_trt_vs_INPctl, "GRanges")
>>>>>>>>
>>>>>>>>
>>>>>>> Error in validObject(.Object) :
>>>>>>>   invalid class "GRanges" object: slot 'strand' contains missing
>>>>>>>
>>>>>>>
>>>>>> values
>>>>
>>>>
>>>>> rd0_chr1_s_8_trt_vs_INPctl
>>>>>>>>
>>>>>>>>
>>>>>>> RangedData with 57 rows and 2 value columns across 1 space
>>>>>>>           space               ranges   |     ARAB8 ARAB7INPCTL
>>>>>>>     <character>             <IRanges>    |<integer>    <integer>
>>>>>>> 1          chr1   [ 617092,  617094]   |        24           0
>>>>>>> 2          chr1   [1808262, 1808262]   |         8           0
>>>>>>> 3          chr1   [3889445, 3889452]   |        64           0
>>>>>>> 4          chr1   [4404410, 4404410]   |         8           0
>>>>>>> 5          chr1   [7081127, 7081127]   |         8           0
>>>>>>> 6          chr1   [7128574, 7128581]   |        64           0
>>>>>>> 7          chr1   [7128592, 7128649]   |       464           0
>>>>>>> 8          chr1   [7530777, 7530781]   |        40           0
>>>>>>> 9          chr1   [7530784, 7530786]   |        24           0
>>>>>>> ...         ...                  ... ...       ...         ...
>>>>>>>
>>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>>
>>>>>>> rd = RangedData(IRanges(1, 10))
>>>>>>> as(rd, "GRanges")
>>>>>>>
>>>>>>>
>>>>>> Error in validObject(.Object) :
>>>>>>  invalid class "GRanges" object: slot 'strand' contains missing values
>>>>>>
>>>>>>
>>>>>>> rd[["strand"]] = "*"
>>>>>>> as(rd, "GRanges")
>>>>>>>
>>>>>>>
>>>>>> GRanges with 1 range and 0 elementMetadata values
>>>>>>    seqnames    ranges strand |
>>>>>>       <Rle>  <IRanges>   <Rle>  |
>>>>>> [1]        1   [1, 10]      * |
>>>>>>
>>>>>> seqlengths
>>>>>>  1
>>>>>> NA
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> Shouldn't the coerce function just do this automatically?
>>>>>
>>>>>
>>>> Currently GRanges thinks of strand as '+', '-', '*', whereas IRanges
>>>> allows NA as well (hence the error) so coercing NA to * represents a
>>>> decision on the part of the investigator that '*' (strand irrelevant) is
>>>> synonymous with NA (no information about strand available). Part of the
>>>> motivation for this current state of affairs is that the use case for
>>>> both NA and * were unclear, but course corrections welcome.
>>>>
>>>>
>>>>
>>>>
>>> Ok. I guess one could think of the coercion of a RangedData missing a
>>> 'strand' column to a GRanges as an equivalent statement, since GRanges
>>> requires strand information. If that doesn't sound reasonable, a better
>>> error message will help avoid questions like this in the future.
>>>
>>> Michael
>>>
>>>
>>>
>>>
>>>
>>>
>>>> Martin
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>> sessionInfo()
>>>>>>>>
>>>>>>>>
>>>>>>> R version 2.12.0 Under development (unstable) (2010-03-30 r51506)
>>>>>>> x86_64-unknown-linux-gnu
>>>>>>>
>>>>>>> locale:
>>>>>>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>>>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>>>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>>>>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>>>
>>>>>>> attached base packages:
>>>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>>>
>>>>>>> other attached packages:
>>>>>>> [1] biomaRt_2.3.5         GenomicFeatures_0.5.0 GenomicRanges_0.1.0
>>>>>>> [4] IRanges_1.5.73
>>>>>>>
>>>>>>> loaded via a namespace (and not attached):
>>>>>>> [1] Biobase_2.7.5      Biostrings_2.15.26 BSgenome_1.15.20
>>>>>>> DBI_0.2-5
>>>>>>> [5] RCurl_1.3-1        RSQLite_0.8-4      rtracklayer_1.7.11
>>>>>>>
>>>>>>>
>>>>>> tools_2.12.0
>>>>
>>>>
>>>>> [9] XML_2.8-1
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> P. Terry
>>>>>>> [email protected]
>>>>>>>
>>>>>>>       [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioc-sig-sequencing mailing list
>>>>>>> [email protected]
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Martin Morgan
>>>>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>>>>> 1100 Fairview Ave. N.
>>>>>> PO Box 19024 Seattle, WA 98109
>>>>>>
>>>>>> Location: Arnold Building M1 B861
>>>>>> Phone: (206) 667-2793
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-sig-sequencing mailing list
>>>>>> [email protected]
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Martin Morgan
>>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N.
>>>> PO Box 19024 Seattle, WA 98109
>>>>
>>>> Location: Arnold Building M1 B861
>>>> Phone: (206) 667-2793
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> [email protected]
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>>
>
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] GenomicFeatures, error in type conversion RangeData to GRanges

Reply via email to