Re: [Bioc-sig-seq] GenomicFeatures, error in type conversion RangeData to GRanges

Michael Lawrence Fri, 02 Apr 2010 05:19:18 -0700

On Thu, Apr 1, 2010 at 10:30 PM, Patrick Aboyoun <[email protected]> wrote:


>  Michael,
> I'm not sure I follow. Converting user-supplied input is different from
> using an argument's default value during function execution. If an
> argument's default value is questionable enough to merit a warning, then it
> is better to get rid of the default value and require the user to provide
> one. I'm fine with making strand a required argument if users find the
> current default value to provide little value, or worse by causing some
> confusion.
>
>
I think my point was simply that R APIs are often written for maximal
convenience and avoid any pedantic constraints. There's the as.factor
example, the use of default values, etc. It seems that if the strand
imputation were documented, as it is for the GRanges constructor, and we
expected the user to read the documentation, there should not be an issue. I
think in 95% of the cases, the 'strand' is known or irrelevant. At least
this is true for mapped reads.


> Patrick
>
>
>
> On 4/1/10 1:32 PM, Michael Lawrence wrote:
>
> I think this is still too pedantic. For example, the GRanges constructor
> defaults to '*'. That should also emit a warning to be consistent with this.
>
>
> On Thu, Apr 1, 2010 at 12:01 PM, Patrick Aboyoun <[email protected]>wrote:
>
>> I just checked in a patch to the GenomicRanges package in which the
>> GRanges constructor will now convert NA values in strand to the both/either
>> strand indicator "*" and issue a warning to the end-user that informs them
>> of the change. The updated GenomicRanges package should be available from
>> bioconductor.org with the next 36 hours. Here is an example:
>>
>>
>> > RangedData(IRanges(1,2))
>> RangedData with 1 row and 0 value columns across 1 space
>>        space    ranges |
>> <character> <IRanges> |
>> 1           1    [1, 2] |
>>
>> > as(RangedData(IRanges(1,2)), "GRanges")
>>
>> GRanges with 1 range and 0 elementMetadata values
>>    seqnames    ranges strand |
>> <Rle> <IRanges> <Rle> |
>>  [1]        1    [1, 2]      * |
>>
>> seqlengths
>>  1
>> NA
>> Warning message:
>> In GRanges(seqnames = space(from), ranges = ranges, strand =
>> Rle(strand(from)),  :
>>  missing values in strand converted to "*"
>>
>> > sessionInfo()
>> R version 2.11.0 Under development (unstable) (2010-03-22 r51355)
>> i386-apple-darwin9.8.0
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>>  [1] GenomicRanges_0.1.3 IRanges_1.5.74
>>
>>
>>
>>
>>
>> On 4/1/10 8:04 AM, Michael Lawrence wrote:
>>
>>> Thinking about this some more, it's somewhat analogous to the coercion to
>>> factor in R, i.e. as.factor(c("male", "female")) returns something
>>> reasonable, despite missing level information.
>>>
>>> as.factor("male") would probably not be what I wanted, but we live with
>>> it,
>>> since the alternative (requiring the levels argument) would probably be
>>> worse.
>>>
>>> On Thu, Apr 1, 2010 at 7:31 AM, Michael Lawrence<[email protected]>
>>>  wrote:
>>>
>>>
>>>
>>>>
>>>> On Thu, Apr 1, 2010 at 7:22 AM, Martin Morgan<[email protected]>
>>>>  wrote:
>>>>
>>>>
>>>>
>>>>> On 04/01/2010 07:12 AM, Michael Lawrence wrote:
>>>>>
>>>>>
>>>>>> On Thu, Apr 1, 2010 at 7:09 AM, Martin Morgan<[email protected]>
>>>>>>
>>>>>>
>>>>> wrote:
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> On 03/31/2010 07:11 PM, [email protected] wrote:
>>>>>>>
>>>>>>>
>>>>>>>>  Dear bioc-sig-sequencing,
>>>>>>>>
>>>>>>>> I would like to annotate chip-seq peaks for the arabidopsis genome.
>>>>>>>>
>>>>>>>>
>>>>>>>   In
>>>>>
>>>>>
>>>>>> trying to work thru the GenomicFeatures vignette dated 03/27/10, I
>>>>>>> need
>>>>>>>
>>>>>>>
>>>>>>  to
>>>>>
>>>>>
>>>>>> convert my ChIPSeq peaks from a RangedData object to a GRanges object.
>>>>>>>
>>>>>>>
>>>>>>   In a
>>>>>
>>>>>
>>>>>> recent, but previous Bioconductor development version, the conversion
>>>>>>>
>>>>>>>
>>>>>>  with
>>>>>
>>>>>
>>>>>> this particular RangedData object worked fine.
>>>>>>>
>>>>>>>
>>>>>>>> In this more recent Bioconductor development version, I get the
>>>>>>>>
>>>>>>>>
>>>>>>>  following
>>>>>
>>>>>
>>>>>> error message:
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> gr_ChSeqPks<- as(rd0_chr1_s_8_trt_vs_INPctl, "GRanges")
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Error in validObject(.Object) :
>>>>>>>>   invalid class "GRanges" object: slot 'strand' contains missing
>>>>>>>>
>>>>>>>>
>>>>>>>  values
>>>>>
>>>>>
>>>>>>  rd0_chr1_s_8_trt_vs_INPctl
>>>>>>>>>
>>>>>>>>>
>>>>>>>> RangedData with 57 rows and 2 value columns across 1 space
>>>>>>>>           space               ranges   |     ARAB8 ARAB7INPCTL
>>>>>>>>     <character>             <IRanges>    |<integer>    <integer>
>>>>>>>> 1          chr1   [ 617092,  617094]   |        24           0
>>>>>>>> 2          chr1   [1808262, 1808262]   |         8           0
>>>>>>>> 3          chr1   [3889445, 3889452]   |        64           0
>>>>>>>> 4          chr1   [4404410, 4404410]   |         8           0
>>>>>>>> 5          chr1   [7081127, 7081127]   |         8           0
>>>>>>>> 6          chr1   [7128574, 7128581]   |        64           0
>>>>>>>> 7          chr1   [7128592, 7128649]   |       464           0
>>>>>>>> 8          chr1   [7530777, 7530781]   |        40           0
>>>>>>>> 9          chr1   [7530784, 7530786]   |        24           0
>>>>>>>> ...         ...                  ... ...       ...         ...
>>>>>>>>
>>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> rd = RangedData(IRanges(1, 10))
>>>>>>>> as(rd, "GRanges")
>>>>>>>>
>>>>>>>>
>>>>>>> Error in validObject(.Object) :
>>>>>>>  invalid class "GRanges" object: slot 'strand' contains missing
>>>>>>> values
>>>>>>>
>>>>>>>
>>>>>>>> rd[["strand"]] = "*"
>>>>>>>> as(rd, "GRanges")
>>>>>>>>
>>>>>>>>
>>>>>>> GRanges with 1 range and 0 elementMetadata values
>>>>>>>    seqnames    ranges strand |
>>>>>>>       <Rle>  <IRanges>   <Rle>  |
>>>>>>> [1]        1   [1, 10]      * |
>>>>>>>
>>>>>>> seqlengths
>>>>>>>  1
>>>>>>> NA
>>>>>>>
>>>>>>> Martin
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Shouldn't the coerce function just do this automatically?
>>>>>>
>>>>>>
>>>>> Currently GRanges thinks of strand as '+', '-', '*', whereas IRanges
>>>>> allows NA as well (hence the error) so coercing NA to * represents a
>>>>> decision on the part of the investigator that '*' (strand irrelevant)
>>>>> is
>>>>> synonymous with NA (no information about strand available). Part of the
>>>>> motivation for this current state of affairs is that the use case for
>>>>> both NA and * were unclear, but course corrections welcome.
>>>>>
>>>>>
>>>>>
>>>>>
>>>> Ok. I guess one could think of the coercion of a RangedData missing a
>>>> 'strand' column to a GRanges as an equivalent statement, since GRanges
>>>> requires strand information. If that doesn't sound reasonable, a better
>>>> error message will help avoid questions like this in the future.
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Martin
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> sessionInfo()
>>>>>>>>>
>>>>>>>>>
>>>>>>>> R version 2.12.0 Under development (unstable) (2010-03-30 r51506)
>>>>>>>> x86_64-unknown-linux-gnu
>>>>>>>>
>>>>>>>> locale:
>>>>>>>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>>>>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>>>>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>>>>>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>>>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>>>>
>>>>>>>> attached base packages:
>>>>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>>>>
>>>>>>>> other attached packages:
>>>>>>>> [1] biomaRt_2.3.5         GenomicFeatures_0.5.0 GenomicRanges_0.1.0
>>>>>>>> [4] IRanges_1.5.73
>>>>>>>>
>>>>>>>> loaded via a namespace (and not attached):
>>>>>>>> [1] Biobase_2.7.5      Biostrings_2.15.26 BSgenome_1.15.20
>>>>>>>> DBI_0.2-5
>>>>>>>> [5] RCurl_1.3-1        RSQLite_0.8-4      rtracklayer_1.7.11
>>>>>>>>
>>>>>>>>
>>>>>>>  tools_2.12.0
>>>>>
>>>>>
>>>>>>  [9] XML_2.8-1
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> P. Terry
>>>>>>>> [email protected]
>>>>>>>>
>>>>>>>>       [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioc-sig-sequencing mailing list
>>>>>>>> [email protected]
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Martin Morgan
>>>>>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>>>>>> 1100 Fairview Ave. N.
>>>>>>> PO Box 19024 Seattle, WA 98109
>>>>>>>
>>>>>>> Location: Arnold Building M1 B861
>>>>>>> Phone: (206) 667-2793
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioc-sig-sequencing mailing list
>>>>>>> [email protected]
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Martin Morgan
>>>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>>>> 1100 Fairview Ave. N.
>>>>> PO Box 19024 Seattle, WA 98109
>>>>>
>>>>> Location: Arnold Building M1 B861
>>>>> Phone: (206) 667-2793
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> [email protected]
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>
>>>
>>
>>
>
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] GenomicFeatures, error in type conversion RangeData to GRanges

Reply via email to