Hi Janet,

Good catch, thanks!

getSeq() was completely rewritten back in June (to be more efficient
and to support GRanges) but unfortunately a regression was introduced
when using a RangedData for which length != nrow.

This is fixed in BSgenome release (1.8.2) and devel (1.9.1).

Cheers,
H.


On 11/15/2010 06:30 PM, Janet Young wrote:
Hi,

I just updated R and to 2.12.0 and BioC to the corresponding latest
version.

I've found some new maybe weird behavior in getSeq (Biostrings) that's
causing a little chaos for me using my code with the updated BioC. I
think I can find a workaround but am also hoping getSeq might be fixable
fairly easily?

Here's my issue: I'm using getSeq to extract multiple sequences at once
from the mouse genome, specifying coordinates using RangedData objects.
That works OK if I use the whole RangedData object, but weird things
start to happen if I just use subsets of the RangedData object
(something to do with factors versus characters for space names,
perhaps, or the function is getting confused with GRanges vs RangedData?).

library(BSgenome.Mmusculus.UCSC.mm9)
library(IRanges)

tempRD <-
RangedData(IRanges(start=c(10000001,10000001),end=c(10000051,10000051)),space=c("chr1","chr2"))


#### simple getSeq looks good
getSeq(Mmusculus,tempRD)
[1] "CTCTTACGTTTTATTCCCTCTTTATCTCAGCTTAGATCAGGGTAAACTTTC"
[2] "AGGCCAACTTTTAGAGGTTGGCTCTCTCCTTCAATTGCATGTCCAGGGAGC"

### but if I subset the RangedData it doesn't look so good - I'd like
the following command to give me just one sequence for the first region
specified in tempRD, but instead it gives me that first region two times
getSeq(Mmusculus,tempRD[1,])
[1] "CTCTTACGTTTTATTCCCTCTTTATCTCAGCTTAGATCAGGGTAAACTTTC"
[2] "CTCTTACGTTTTATTCCCTCTTTATCTCAGCTTAGATCAGGGTAAACTTTC"

### also if I have unused space names I get an error

tempRD3 <-
RangedData(IRanges(start=c(10000001,10000001,10000001),end=c(10000051,10000051,10000051)),space=as.character(c("chr1","chr2","chr3"))
)

######
tempRD4 <- tempRD3[1:2,]

getSeq(Mmusculus,tempRD4)

Error in validObject(.Object) :
invalid class "GRanges" object: slot lengths are not all equal
In addition: Warning message:
In newCompressedList("CompressedSplitDataFrameList", x, splitFactor = f, :
data length is not a multiple of split variable

### one possible workaround - get rid of the unused space name
tempRD5 <-
RangedData(IRanges(start(tempRD4),end(tempRD4)),space=as.character(space(tempRD4)))

getSeq(Mmusculus,tempRD5) #### now this works

#############

Hope that all makes some sense - thanks very much,

Janet



-------------------------------------------------------------------

Dr. Janet Young (Trask lab)

Fred Hutchinson Cancer Research Center
1100 Fairview Avenue N., C3-168,
P.O. Box 19024, Seattle, WA 98109-1024, USA.

tel: (206) 667 1471 fax: (206) 667 6524
email: jayoung ...at... fhcrc.org

http://www.fhcrc.org/labs/trask/

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [email protected]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to