Re: [Bioc-sig-seq] GenomicFeatures, error in type conversion RangeData to GRanges

Michael Lawrence Thu, 01 Apr 2010 08:05:14 -0700

Thinking about this some more, it's somewhat analogous to the coercion to
factor in R, i.e. as.factor(c("male", "female")) returns something
reasonable, despite missing level information.


as.factor("male") would probably not be what I wanted, but we live with it,
since the alternative (requiring the levels argument) would probably be
worse.

On Thu, Apr 1, 2010 at 7:31 AM, Michael Lawrence <[email protected]> wrote:

>
>
> On Thu, Apr 1, 2010 at 7:22 AM, Martin Morgan <[email protected]> wrote:
>
>> On 04/01/2010 07:12 AM, Michael Lawrence wrote:
>> > On Thu, Apr 1, 2010 at 7:09 AM, Martin Morgan <[email protected]>
>> wrote:
>> >
>> >> On 03/31/2010 07:11 PM, [email protected] wrote:
>> >>>  Dear bioc-sig-sequencing,
>> >>>
>> >>> I would like to annotate chip-seq peaks for the arabidopsis genome.
>>  In
>> >> trying to work thru the GenomicFeatures vignette dated 03/27/10, I need
>> to
>> >> convert my ChIPSeq peaks from a RangedData object to a GRanges object.
>>  In a
>> >> recent, but previous Bioconductor development version, the conversion
>> with
>> >> this particular RangedData object worked fine.
>> >>>
>> >>> In this more recent Bioconductor development version, I get the
>> following
>> >> error message:
>> >>>
>> >>>> gr_ChSeqPks <- as(rd0_chr1_s_8_trt_vs_INPctl, "GRanges")
>> >>> Error in validObject(.Object) :
>> >>>   invalid class "GRanges" object: slot 'strand' contains missing
>> values
>> >>>> rd0_chr1_s_8_trt_vs_INPctl
>> >>> RangedData with 57 rows and 2 value columns across 1 space
>> >>>           space               ranges   |     ARAB8 ARAB7INPCTL
>> >>>     <character>            <IRanges>   | <integer>   <integer>
>> >>> 1          chr1   [ 617092,  617094]   |        24           0
>> >>> 2          chr1   [1808262, 1808262]   |         8           0
>> >>> 3          chr1   [3889445, 3889452]   |        64           0
>> >>> 4          chr1   [4404410, 4404410]   |         8           0
>> >>> 5          chr1   [7081127, 7081127]   |         8           0
>> >>> 6          chr1   [7128574, 7128581]   |        64           0
>> >>> 7          chr1   [7128592, 7128649]   |       464           0
>> >>> 8          chr1   [7530777, 7530781]   |        40           0
>> >>> 9          chr1   [7530784, 7530786]   |        24           0
>> >>> ...         ...                  ... ...       ...         ...
>> >>
>> >> Hi,
>> >>
>> >>> rd = RangedData(IRanges(1, 10))
>> >>> as(rd, "GRanges")
>> >> Error in validObject(.Object) :
>> >>  invalid class "GRanges" object: slot 'strand' contains missing values
>> >>> rd[["strand"]] = "*"
>> >>> as(rd, "GRanges")
>> >> GRanges with 1 range and 0 elementMetadata values
>> >>    seqnames    ranges strand |
>> >>       <Rle> <IRanges>  <Rle> |
>> >> [1]        1   [1, 10]      * |
>> >>
>> >> seqlengths
>> >>  1
>> >> NA
>> >>
>> >> Martin
>> >>
>> >>
>> > Shouldn't the coerce function just do this automatically?
>>
>> Currently GRanges thinks of strand as '+', '-', '*', whereas IRanges
>> allows NA as well (hence the error) so coercing NA to * represents a
>> decision on the part of the investigator that '*' (strand irrelevant) is
>> synonymous with NA (no information about strand available). Part of the
>> motivation for this current state of affairs is that the use case for
>> both NA and * were unclear, but course corrections welcome.
>>
>>
> Ok. I guess one could think of the coercion of a RangedData missing a
> 'strand' column to a GRanges as an equivalent statement, since GRanges
> requires strand information. If that doesn't sound reasonable, a better
> error message will help avoid questions like this in the future.
>
> Michael
>
>
>
>
>> Martin
>> >
>> >>>
>> >>>> sessionInfo()
>> >>> R version 2.12.0 Under development (unstable) (2010-03-30 r51506)
>> >>> x86_64-unknown-linux-gnu
>> >>>
>> >>> locale:
>> >>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>> >>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>> >>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>> >>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>> >>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>> >>>
>> >>> attached base packages:
>> >>> [1] stats     graphics  grDevices utils     datasets  methods   base
>> >>>
>> >>> other attached packages:
>> >>> [1] biomaRt_2.3.5         GenomicFeatures_0.5.0 GenomicRanges_0.1.0
>> >>> [4] IRanges_1.5.73
>> >>>
>> >>> loaded via a namespace (and not attached):
>> >>> [1] Biobase_2.7.5      Biostrings_2.15.26 BSgenome_1.15.20   DBI_0.2-5
>> >>> [5] RCurl_1.3-1        RSQLite_0.8-4      rtracklayer_1.7.11
>> tools_2.12.0
>> >>> [9] XML_2.8-1
>> >>>>
>> >>>
>> >>>
>> >>> Thanks,
>> >>> P. Terry
>> >>> [email protected]
>> >>>
>> >>>       [[alternative HTML version deleted]]
>> >>>
>> >>> _______________________________________________
>> >>> Bioc-sig-sequencing mailing list
>> >>> [email protected]
>> >>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>> >>
>> >>
>> >> --
>> >> Martin Morgan
>> >> Computational Biology / Fred Hutchinson Cancer Research Center
>> >> 1100 Fairview Ave. N.
>> >> PO Box 19024 Seattle, WA 98109
>> >>
>> >> Location: Arnold Building M1 B861
>> >> Phone: (206) 667-2793
>> >>
>> >> _______________________________________________
>> >> Bioc-sig-sequencing mailing list
>> >> [email protected]
>> >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>> >>
>> >
>>
>>
>> --
>> Martin Morgan
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
>>
>
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] GenomicFeatures, error in type conversion RangeData to GRanges

Reply via email to