Re: [Bioc-sig-seq] GenomicFeatures, error in type conversion RangeData to GRanges

Patrick Aboyoun Thu, 01 Apr 2010 12:01:56 -0700

I just checked in a patch to the GenomicRanges package in which theGRanges constructor will now convert NA values in strand to theboth/either strand indicator "*" and issue a warning to the end-userthat informs them of the change. The updated GenomicRanges packageshould be available from bioconductor.org with the next 36 hours. Hereis an example:


> RangedData(IRanges(1,2))
RangedData with 1 row and 0 value columns across 1 space
        space    ranges |
<character> <IRanges> |
1           1    [1, 2] |

> as(RangedData(IRanges(1,2)), "GRanges")
GRanges with 1 range and 0 elementMetadata values
    seqnames    ranges strand |
<Rle> <IRanges> <Rle> |
[1]        1    [1, 2]      * |

seqlengths
 1
NA
Warning message:

In GRanges(seqnames = space(from), ranges = ranges, strand =Rle(strand(from)), :

  missing values in strand converted to "*"

> sessionInfo()
R version 2.11.0 Under development (unstable) (2010-03-22 r51355)
i386-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] GenomicRanges_0.1.3 IRanges_1.5.74




On 4/1/10 8:04 AM, Michael Lawrence wrote:

Thinking about this some more, it's somewhat analogous to the coercion to
factor in R, i.e. as.factor(c("male", "female")) returns something
reasonable, despite missing level information.

as.factor("male") would probably not be what I wanted, but we live with it,
since the alternative (requiring the levels argument) would probably be
worse.

On Thu, Apr 1, 2010 at 7:31 AM, Michael Lawrence<[email protected]>  wrote:


On Thu, Apr 1, 2010 at 7:22 AM, Martin Morgan<[email protected]>  wrote:

On 04/01/2010 07:12 AM, Michael Lawrence wrote:

On Thu, Apr 1, 2010 at 7:09 AM, Martin Morgan<[email protected]>

wrote:

On 03/31/2010 07:11 PM, [email protected] wrote:

  Dear bioc-sig-sequencing,

I would like to annotate chip-seq peaks for the arabidopsis genome.

In

trying to work thru the GenomicFeatures vignette dated 03/27/10, I need

to

convert my ChIPSeq peaks from a RangedData object to a GRanges object.

  In a

recent, but previous Bioconductor development version, the conversion

with

this particular RangedData object worked fine.

In this more recent Bioconductor development version, I get the

following

error message:

gr_ChSeqPks<- as(rd0_chr1_s_8_trt_vs_INPctl, "GRanges")

Error in validObject(.Object) :
   invalid class "GRanges" object: slot 'strand' contains missing

values

rd0_chr1_s_8_trt_vs_INPctl

RangedData with 57 rows and 2 value columns across 1 space
           space               ranges   |     ARAB8 ARAB7INPCTL
     <character>             <IRanges>    |<integer>    <integer>
1          chr1   [ 617092,  617094]   |        24           0
2          chr1   [1808262, 1808262]   |         8           0
3          chr1   [3889445, 3889452]   |        64           0
4          chr1   [4404410, 4404410]   |         8           0
5          chr1   [7081127, 7081127]   |         8           0
6          chr1   [7128574, 7128581]   |        64           0
7          chr1   [7128592, 7128649]   |       464           0
8          chr1   [7530777, 7530781]   |        40           0
9          chr1   [7530784, 7530786]   |        24           0
...         ...                  ... ...       ...         ...

Hi,

rd = RangedData(IRanges(1, 10))
as(rd, "GRanges")

Error in validObject(.Object) :
  invalid class "GRanges" object: slot 'strand' contains missing values

rd[["strand"]] = "*"
as(rd, "GRanges")

GRanges with 1 range and 0 elementMetadata values
    seqnames    ranges strand |
       <Rle>  <IRanges>   <Rle>  |
[1]        1   [1, 10]      * |

seqlengths
  1
NA

Martin

Shouldn't the coerce function just do this automatically?

Currently GRanges thinks of strand as '+', '-', '*', whereas IRanges
allows NA as well (hence the error) so coercing NA to * represents a
decision on the part of the investigator that '*' (strand irrelevant) is
synonymous with NA (no information about strand available). Part of the
motivation for this current state of affairs is that the use case for
both NA and * were unclear, but course corrections welcome.

Ok. I guess one could think of the coercion of a RangedData missing a
'strand' column to a GRanges as an equivalent statement, since GRanges
requires strand information. If that doesn't sound reasonable, a better
error message will help avoid questions like this in the future.

Michael

Martin

sessionInfo()

R version 2.12.0 Under development (unstable) (2010-03-30 r51506)
x86_64-unknown-linux-gnu

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] biomaRt_2.3.5         GenomicFeatures_0.5.0 GenomicRanges_0.1.0
[4] IRanges_1.5.73

loaded via a namespace (and not attached):
[1] Biobase_2.7.5      Biostrings_2.15.26 BSgenome_1.15.20   DBI_0.2-5
[5] RCurl_1.3-1        RSQLite_0.8-4      rtracklayer_1.7.11

tools_2.12.0

[9] XML_2.8-1


Thanks,
P. Terry
[email protected]

       [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] GenomicFeatures, error in type conversion RangeData to GRanges

Reply via email to