I just checked in a patch to the GenomicRanges package in which the
GRanges constructor will now convert NA values in strand to the
both/either strand indicator "*" and issue a warning to the end-user
that informs them of the change. The updated GenomicRanges package
should be available from bioconductor.org with the next 36 hours. Here
is an example:
> RangedData(IRanges(1,2))
RangedData with 1 row and 0 value columns across 1 space
space ranges |
<character> <IRanges> |
1 1 [1, 2] |
> as(RangedData(IRanges(1,2)), "GRanges")
GRanges with 1 range and 0 elementMetadata values
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
[1] 1 [1, 2] * |
seqlengths
1
NA
Warning message:
In GRanges(seqnames = space(from), ranges = ranges, strand =
Rle(strand(from)), :
missing values in strand converted to "*"
> sessionInfo()
R version 2.11.0 Under development (unstable) (2010-03-22 r51355)
i386-apple-darwin9.8.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicRanges_0.1.3 IRanges_1.5.74
On 4/1/10 8:04 AM, Michael Lawrence wrote:
Thinking about this some more, it's somewhat analogous to the coercion to
factor in R, i.e. as.factor(c("male", "female")) returns something
reasonable, despite missing level information.
as.factor("male") would probably not be what I wanted, but we live with it,
since the alternative (requiring the levels argument) would probably be
worse.
On Thu, Apr 1, 2010 at 7:31 AM, Michael Lawrence<[email protected]> wrote:
On Thu, Apr 1, 2010 at 7:22 AM, Martin Morgan<[email protected]> wrote:
On 04/01/2010 07:12 AM, Michael Lawrence wrote:
On Thu, Apr 1, 2010 at 7:09 AM, Martin Morgan<[email protected]>
wrote:
On 03/31/2010 07:11 PM, [email protected] wrote:
Dear bioc-sig-sequencing,
I would like to annotate chip-seq peaks for the arabidopsis genome.
In
trying to work thru the GenomicFeatures vignette dated 03/27/10, I need
to
convert my ChIPSeq peaks from a RangedData object to a GRanges object.
In a
recent, but previous Bioconductor development version, the conversion
with
this particular RangedData object worked fine.
In this more recent Bioconductor development version, I get the
following
error message:
gr_ChSeqPks<- as(rd0_chr1_s_8_trt_vs_INPctl, "GRanges")
Error in validObject(.Object) :
invalid class "GRanges" object: slot 'strand' contains missing
values
rd0_chr1_s_8_trt_vs_INPctl
RangedData with 57 rows and 2 value columns across 1 space
space ranges | ARAB8 ARAB7INPCTL
<character> <IRanges> |<integer> <integer>
1 chr1 [ 617092, 617094] | 24 0
2 chr1 [1808262, 1808262] | 8 0
3 chr1 [3889445, 3889452] | 64 0
4 chr1 [4404410, 4404410] | 8 0
5 chr1 [7081127, 7081127] | 8 0
6 chr1 [7128574, 7128581] | 64 0
7 chr1 [7128592, 7128649] | 464 0
8 chr1 [7530777, 7530781] | 40 0
9 chr1 [7530784, 7530786] | 24 0
... ... ... ... ... ...
Hi,
rd = RangedData(IRanges(1, 10))
as(rd, "GRanges")
Error in validObject(.Object) :
invalid class "GRanges" object: slot 'strand' contains missing values
rd[["strand"]] = "*"
as(rd, "GRanges")
GRanges with 1 range and 0 elementMetadata values
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
[1] 1 [1, 10] * |
seqlengths
1
NA
Martin
Shouldn't the coerce function just do this automatically?
Currently GRanges thinks of strand as '+', '-', '*', whereas IRanges
allows NA as well (hence the error) so coercing NA to * represents a
decision on the part of the investigator that '*' (strand irrelevant) is
synonymous with NA (no information about strand available). Part of the
motivation for this current state of affairs is that the use case for
both NA and * were unclear, but course corrections welcome.
Ok. I guess one could think of the coercion of a RangedData missing a
'strand' column to a GRanges as an equivalent statement, since GRanges
requires strand information. If that doesn't sound reasonable, a better
error message will help avoid questions like this in the future.
Michael
Martin
sessionInfo()
R version 2.12.0 Under development (unstable) (2010-03-30 r51506)
x86_64-unknown-linux-gnu
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_2.3.5 GenomicFeatures_0.5.0 GenomicRanges_0.1.0
[4] IRanges_1.5.73
loaded via a namespace (and not attached):
[1] Biobase_2.7.5 Biostrings_2.15.26 BSgenome_1.15.20 DBI_0.2-5
[5] RCurl_1.3-1 RSQLite_0.8-4 rtracklayer_1.7.11
tools_2.12.0
[9] XML_2.8-1
Thanks,
P. Terry
[email protected]
[[alternative HTML version deleted]]
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
[[alternative HTML version deleted]]
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing