Hi Martin,
Thanks for that! Sorry that I forgot to put a subject in the original post.
I'm currently looking for some serious benchmarking of the different alignment
software available (maq, bowtie, zoom, pairwiseAlignment in R, etc...) and
although I've been scanning the literature and forums quite a lot, I didn't
find anything done recently. Has anybody heard of something like that?
Cheers,
Nico
Quoting Martin Morgan <[email protected]>:
Hi --
[email protected] writes:
Hi,
I'm currently comparing the alignment generated by bowtie and maq on
the data
used in the HilbertVis pdf file of the ShortRead package (NCBI SRA
accession:
SRA000206). From the original .seq and .prb solexa files, I have
generated the
fastq files and aligned them to the ref. genome using bowtie and maq.
I load both map files in R using readAligned. Then when filtering my
reads, I
realized that most of the default filter functions have been written for maq
and are therefore not necessarily available for bowtie.
Not really for MAQ, but you're right that the filters make assumptions
about data available for filtering.
I'm actually interested in the alignQualityFilter in order to sort
those reads
which map several times in the genome (as described in the page 22
of the "I/0
and Quality Assessment using ShortRead" of last November bioC tutorial
session). But, this method relies on the quality slot of the object
returned by
the alignQuality accessor, which is NA in the case of a bowtie
alignment. This
is actually normal, as the final quality mapping value stored in
this object is
only available in the maq map file and now in the bowtie one.
So I'd just would like to know if there would be another way to
filter out the
reads mapping multiple times in the genome when using an AlignedRead object
obtained from a bowtie map file.
Sorry that this sounds quite confusing... Here is the actual code and what
happens:
filt <- alignQualityFilter(threshold=1)
# works
aln<-readAligned( "H3K4me1", "run13_lane4.map", type="MAQMap", filter=filt)
# fails
aln2<-readAligned( "H3K4me1.bowtie", "run13_lane4.map", type="Bowtie",
filter=filt)
As far as I can tell, Bowtie relies on the read identifier being
unique for each read. So you can
aln2 <- readAligned("H3K4me1.bowtie", "run13_lane4.map",
type="Bowtie")
aln2[!srduplicated(id(aln2))]
Several important caveats. This requires ShortRead >1.1.37 (available
in svn now, with BiocLite Wednesday after about 1 Seattle time, all
being well; the original bowtie parser [intentionally] ignored the
id). Note that the above keeps the _first_ of the duplicated
identifiers; MAQ chooses a random read to report (though the random
number can be set on the MAQ run and hence obtain repeatable results,
or all alignments can be printed out, but the MAQ format when
reporting all reads is, I believe, different from that expected by
ShortRead and readAligned).
You can create your own more elaborate filters, and define them (e.g.,
to be used in readAligned) with the srFilter function, see ?srFilter
for an example.
Hope that helps,
Martin
Error in x...@ranges[i] : subscript contains NAs
Any ideas how to filter a bowtie type alignment for reads mapping
multiple time
in the genome?
Cheers,
N.
sessionInfo()
R version 2.9.0 Under development (unstable) (2009-01-04 r47472)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] HilbertVis_1.1.4 ShortRead_1.1.33 lattice_0.17-20 BSgenome_1.11.8
[5] Biobase_2.3.9 Biostrings_2.11.22 IRanges_1.1.33
loaded via a namespace (and not attached):
[1] grid_2.9.0 Matrix_0.999375-17 tools_2.9.0
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M2 B169
Phone: (206) 667-2793
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing