The problem is best illustrated with an example (with comments in bold):

# Load libraries:
library('GenomicFeatures')
library("BSgenome.Hsapiens.UCSC.hg19�)

# Create two test transcripts, one on each strand
testTranscriptPlus <- GRanges(seqnames='chr1', 
ranges=IRanges(start=1e5+c(1,11), end=1e5+c(3,13)), strand='+')
testTranscriptMinus <- GRanges(seqnames='chr1', 
ranges=IRanges(start=1e5+c(1,11), end=1e5+c(3,13)), strand='-�)

# Use getSeq() to extract sequence

getSeq(Hsapiens, testTranscriptPlus)
  A DNAStringSet instance of length 2
    width seq
[1]     3 ACT
[2]     3 CAG

> testTranscriptMinus <- GRanges(seqnames='chr1', 
> ranges=IRanges(start=1e5+c(1,11), end=1e5+c(3,13)), strand='-')
> getSeq(Hsapiens, testTranscriptMinus)
  A DNAStringSet instance of length 2
    width seq
[1]     3 AGT
[2]     3 CTG

# By comparing the plus and minus transcripts it can be seen that getSeq 
returns the sequences in the 5� to 3� orientation no matter the strand (very 
nice feature by the way)
# Furthermore we would expect the transcript sequence of the 
testTranscriptMinus to be: CTGAGT (since we are on the minus strand).

# Now lets extract the sequences of the minus strand transcript using the 
extractTranscriptSeqs() function
extractTranscriptSeqs( Hsapiens, split( testTranscriptMinus, 
f=c('test','test')))
  A DNAStringSet instance of length 1
    width seq                                                                   
                                                          names               
[1]     6 AGTCTG                                                                
                                                          test

# From which it can be observed that extractTranscriptSeqs() have concatenated 
the exons without taking the order into account (which works fine on plus 
strand but results in non-exisiting transcript from the minus strand).

> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] C

attached base packages:
[1] grDevices datasets  grid      parallel  stats     graphics  utils     
methods   base     

other attached packages:
 [1] GenomicFeatures_1.17.14            AnnotationDbi_1.27.9               
Biobase_2.25.0                     BSgenome.Hsapiens.UCSC.hg19_1.3.99
 [5] BSgenome_1.33.8                    Biostrings_2.33.13                 
XVector_0.5.7                      spliceR_1.7.1                     
 [9] plyr_1.8.1                         RColorBrewer_1.0-5                 
VennDiagram_1.6.7                  cummeRbund_2.7.2                  
[13] Gviz_1.9.11                        rtracklayer_1.25.13                
GenomicRanges_1.17.28              GenomeInfoDb_1.1.18               
[17] IRanges_1.99.24                    S4Vectors_0.1.2                    
fastcluster_1.1.13                 reshape2_1.4                      
[21] ggplot2_1.0.0                      RSQLite_0.11.4                     
DBI_0.2-7                          BiocGenerics_0.11.4         


-- 
Kindest regards
Kristoffer Vitting-Seerup, cand.scient. (M.Sc.), 
Ph.D Fellow
Sandelin Group

Bioinformatics Centre | Biotech Research & Innovation Centre (BRIC), Dep. Of 
Biology
University of Copenhagen
Building 1, 3th floor, office 3 (1-3-03)
Ole Maal�es Vej 5
DK-2200 Copenhagen N
Denmark
http://binf.ku.dk | http://www.bric.ku.dk







        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to