Hello Ola, We've determined that your BAM file has quite a few errors in the alignments on the negative strand. As part of our work as the ENCODE Data Coordination Center we've receive thousands of BAM files that we validate with a program called validateFiles. This program checks each alignment in every BAM to make sure that there are not more mismatches than are expected.
I ran this program on your BAM with flags allowing up to 12 mismatches which finds 515,226 alignments that exceed this limit which are *all* on the negative strand, and almost all use the 'S' character in the CIGAR string. You can use this program yourself by downloading it from here: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/validateFiles You'll also need the hg18.2bit file and chromosome sizes file which are here: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/hg18.2bit http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/chromInfo.txt.gz Here is how I ran the program: $ validateFiles -type=BAM -genome=hg18.2bit -chromInfo=chromInfo.txt.gz *.bam -doReport -showBadAlign -maxErrors=1000000 -mismatches=12 -nMatch I hope this helps you track down where this error is being introduced. Please respond to this list if you have further questions. Brian On Thu, Jun 30, 2011 at 1:07 AM, Ola Wallerman <[email protected]> wrote: > Hi, > > I am using the genome browser to view BAM files which have been > trimmed to remove overlapping read ends for Illumina PE reads. It > appears the browser is not correctly placing the clipped reads, see > the example below. The browser removes the clipped part of the read, > but the remaining part is always positioned at the start of the > alignment, meaning that reads on the reverse strand will be misplaced. > Is this a bug or am I doing something wrong? > > Cheers, > > Ola > > Read name: HWI-ST344_0091:6:1204:11502:145261#CTTGTA > Position: chr1:31021473-31021509 > Band: 1p35.2 > Genomic Size: 37 > Alignment Quality: 60 > CIGAR string: 63S37M (63 Skipped, 37 (mis)Match) > Tags: AM:37 LB:cw19 MD:100 NM:0 RG:H3k27ac SM:37 XT:U X0:1 X1:0 XM:0 XO:0 XG:0 > Flags: 0x93: > (0x80) Read 2 of pair | (0x10) Read is on '-' strand | (0x03) > Properly paired > Note: although the read was mapped to the reverse strand of the > genome, the sequence and CIGAR in BAM are relative to the forward > strand. > > Alignment of HWI-ST344_0091:6:1204:11502:145261#CTTGTA to > chr1:31021473-31021509: > > 00000064 GGAGGCTGAGGCACGAGAATCAATTGAACCTGGGAGG 00000100 >>>>>>>>> | | || |||| | | | || >>>>>>>> > 31021473 atctctactaaaaatacaaaaaattagccaggcgtgg 31021509 > > > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
