Re: [Bioc-sig-seq] Filtering Bowtie alignments (has nosubject before)

Patrick Aboyoun Thu, 22 Jan 2009 07:46:31 -0800

Nico,

The main short read aligner in Bioconductor is called matchPDict fromthe Biostrings package. It is comparable in speed to MAQ. ThepairwiseAlignment function is intended to provide a flexible alignmentframework for exploration rather than pipeline integration.



Patrick


João Fadista wrote:

Hi,
Dan Koboldt will present a poster at this year's Marco Island sequencing meeting evaluating short read aligners. He has compiled a partial list of aligners at: http://www.massgenomics.org/2009/01/short-read-aligners-update-at-agbt.html
Cheers,
João


-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of [email protected]
Sent: Thursday, January 22, 2009 9:51 AM
To: Martin Morgan
Cc: [email protected]
Subject: Re: [Bioc-sig-seq] Filtering Bowtie alignments (has nosubject before)

Hi Martin,

Thanks for that! Sorry that I forgot to put a subject in the original post.

I'm currently looking for some serious benchmarking of the different alignment 
software available (maq, bowtie, zoom, pairwiseAlignment in R, etc...) and 
although I've been scanning the literature and forums quite a lot, I didn't 
find anything done recently. Has anybody heard of something like that?

Cheers,

Nico


Quoting Martin Morgan <[email protected]>:
Hi  --

[email protected] writes:
Hi,
I'm currently comparing the alignment generated by bowtie and maq onthe data used in the HilbertVis pdf file of the ShortRead package(NCBI SRA
accession:
SRA000206). From the original .seq and .prb solexa files, I havegenerated the fastq files and aligned them to the ref. genome usingbowtie and maq.
I load both map files in R using readAligned. Then when filtering myreads, I realized that most of the default filter functions have beenwritten for maq and are therefore not necessarily available forbowtie.
Not really for MAQ, but you're right that the filters make assumptionsabout data available for filtering.
I'm actually interested in the alignQualityFilter in order to sortthose reads which map several times in the genome (as described inthe page 22 of the "I/0 and Quality Assessment using ShortRead" oflast November bioC tutorial session). But, this method relies on thequality slot of the object returned by the alignQuality accessor,which is NA in the case of a bowtie alignment. This is actuallynormal, as the final quality mapping value stored in this object isonly available in the maq map file and now in the bowtie one.
So I'd just would like to know if there would be another way tofilter out the reads mapping multiple times in the genome when usingan AlignedRead object obtained from a bowtie map file.Sorry that this sounds quite confusing... Here is the actual code andwhat
happens:

filt <- alignQualityFilter(threshold=1)

# works
aln<-readAligned( "H3K4me1", "run13_lane4.map", type="MAQMap",filter=filt)
# fails
aln2<-readAligned( "H3K4me1.bowtie", "run13_lane4.map",type="Bowtie",
filter=filt)
As far as I can tell, Bowtie relies on the read identifier beingunique for each read. So you can
 aln2 <- readAligned("H3K4me1.bowtie", "run13_lane4.map",
                     type="Bowtie")
 aln2[!srduplicated(id(aln2))]
Several important caveats. This requires ShortRead >1.1.37 (availablein svn now, with BiocLite Wednesday after about 1 Seattle time, allbeing well; the original bowtie parser [intentionally] ignored theid). Note that the above keeps the _first_ of the duplicatedidentifiers; MAQ chooses a random read to report (though the randomnumber can be set on the MAQ run and hence obtain repeatable results,or all alignments can be printed out, but the MAQ format whenreporting all reads is, I believe, different from that expected byShortRead and readAligned).
You can create your own more elaborate filters, and define them (e.g.,to be used in readAligned) with the srFilter function, see ?srFilterfor an example.
Hope that helps,

Martin
Error in x...@ranges[i] : subscript contains NAs
Any ideas how to filter a bowtie type alignment for reads mappingmultiple time in the genome?
Cheers,

N.
sessionInfo()
R version 2.9.0 Under development (unstable) (2009-01-04 r47472)x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_U
S.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC
_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDE
NTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] HilbertVis_1.1.4   ShortRead_1.1.33   lattice_0.17-20    BSgenome_1.11.8
[5] Biobase_2.3.9      Biostrings_2.11.22 IRanges_1.1.33

loaded via a namespace (and not attached):
[1] grid_2.9.0         Matrix_0.999375-17 tools_2.9.0

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center 1100Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] Filtering Bowtie alignments (has nosubject before)

Reply via email to