The problem is best illustrated with an example (with comments in bold): # Load libraries: library('GenomicFeatures') library("BSgenome.Hsapiens.UCSC.hg19�)
# Create two test transcripts, one on each strand testTranscriptPlus <- GRanges(seqnames='chr1', ranges=IRanges(start=1e5+c(1,11), end=1e5+c(3,13)), strand='+') testTranscriptMinus <- GRanges(seqnames='chr1', ranges=IRanges(start=1e5+c(1,11), end=1e5+c(3,13)), strand='-�) # Use getSeq() to extract sequence getSeq(Hsapiens, testTranscriptPlus) A DNAStringSet instance of length 2 width seq [1] 3 ACT [2] 3 CAG > testTranscriptMinus <- GRanges(seqnames='chr1', > ranges=IRanges(start=1e5+c(1,11), end=1e5+c(3,13)), strand='-') > getSeq(Hsapiens, testTranscriptMinus) A DNAStringSet instance of length 2 width seq [1] 3 AGT [2] 3 CTG # By comparing the plus and minus transcripts it can be seen that getSeq returns the sequences in the 5� to 3� orientation no matter the strand (very nice feature by the way) # Furthermore we would expect the transcript sequence of the testTranscriptMinus to be: CTGAGT (since we are on the minus strand). # Now lets extract the sequences of the minus strand transcript using the extractTranscriptSeqs() function extractTranscriptSeqs( Hsapiens, split( testTranscriptMinus, f=c('test','test'))) A DNAStringSet instance of length 1 width seq names [1] 6 AGTCTG test # From which it can be observed that extractTranscriptSeqs() have concatenated the exons without taking the order into account (which works fine on plus strand but results in non-exisiting transcript from the minus strand). > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] C attached base packages: [1] grDevices datasets grid parallel stats graphics utils methods base other attached packages: [1] GenomicFeatures_1.17.14 AnnotationDbi_1.27.9 Biobase_2.25.0 BSgenome.Hsapiens.UCSC.hg19_1.3.99 [5] BSgenome_1.33.8 Biostrings_2.33.13 XVector_0.5.7 spliceR_1.7.1 [9] plyr_1.8.1 RColorBrewer_1.0-5 VennDiagram_1.6.7 cummeRbund_2.7.2 [13] Gviz_1.9.11 rtracklayer_1.25.13 GenomicRanges_1.17.28 GenomeInfoDb_1.1.18 [17] IRanges_1.99.24 S4Vectors_0.1.2 fastcluster_1.1.13 reshape2_1.4 [21] ggplot2_1.0.0 RSQLite_0.11.4 DBI_0.2-7 BiocGenerics_0.11.4 -- Kindest regards Kristoffer Vitting-Seerup, cand.scient. (M.Sc.), Ph.D Fellow Sandelin Group Bioinformatics Centre | Biotech Research & Innovation Centre (BRIC), Dep. Of Biology University of Copenhagen Building 1, 3th floor, office 3 (1-3-03) Ole Maal�es Vej 5 DK-2200 Copenhagen N Denmark http://binf.ku.dk | http://www.bric.ku.dk [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel