Hey Jaaved,

Thanks for the very detailed problem report.  The behavior you are
seeing is due to how MAF coordinates are defined.  If an alignment is
on the negative (-) strand, then the coordinate reported is the number
of bases from the *end* of the sequence.  This means that to convert
the coordinates in the MAF file to the coordinates that blat gives
you, or to extract sequence from a FASTA or 2bit file, you need to
subtract the end coordinate of the range, from the total size of the
scaffold.

So,  if your MAF coordinate is scaffold_13337 1466195 1466344, then to
calculate the start coordinate of the region on the positive strand,
you need to take the size of scaffold_13337, which is 23293914, and
subtract the end coordinate of your range from it:

23293914 - 1466344 = 21827570  which is the 0-based coordinate where
you can find the sequence from the MAF block in the FASTA file.
Remember that the browser reports coordinates in a 1-based system, but
the MAF and PSL files are in 0-based coordinates.

We realize that these coordinate systems can be confusing, but using
these systems internally makes it easier to avoid different code for
alignments on the different strands.

I hope this answers your questions.  If you have any follow-up
questions, please respond to this list.

Brian

On Thu, Jan 27, 2011 at 2:15 PM, Jaaved Mohammed <[email protected]> wrote:
> Hello,
>
> I am seeing numerous discrepancy between the sequences and coordinates in
> the insect 15-way multiple-alignment files
> (http://hgdownload.cse.ucsc.edu/goldenPath/dm3/multiz15way/) primarily for
> D. ananassae and D. virilus. Hopefully I can explain this as concisely as
> possible, however, let me know how I can clarify:
>
> For example:
> 1. I extract the MAF block of a popular melanogaster miRNA (dme-mir-289,
> chr3L:13613907-13614035)using the maf_parse program from Adam Siepel's lab:
> maf_parse -o MAF --start 13613907 --end 13614035 chr3L.maf
> See the attached dme-mir-289.maf output file. Notice that the droAna3
> sequence in this file is
> "CAGCTCGGGTTTTAGGTTGAGTTTACAGTAAAATAAATATTTAAGTGGAGCCTGCGACTctgctactgccactgc
> cactgccactgccactgccGCTCGGGGAGTCACTTGAGCGTTTGTTGGCACGTAAAAGACATCATAATTAGCATT"
> and the coordinate is scaffold_13337:1466195-1466344 -.
>
> 2. When I extract the sequence of this coordinate from the droAna3.fa file
> (http://hgdownload.cse.ucsc.edu/goldenPath/droAna3/bigZips/) and reverse
> complement it, I get a completely different sequence than that reported (see
> the file droAna3_289_maf.fa).
>
> 3. When I blat the sequence reported in the MAF file against droAna3.fa, I
> get a best scoring coordinate that is different from the MAF coordinate.
> That is, scaffold_13337:21827571-21827720 -. No blat reported coordinate
> agrees with the MAF coordinate (see droAna3_blatOutput.txt for all blat
> reported coordinates).  When I extract the sequence of the blat reported
> coordinate and reverse complement it (see droAna3_289_blat.fa), this
> sequence agrees with the MAF reported sequence.
>
> I can furnish additional examples upon request. I really appreciate any time
> and effort spent helping me explain this bizarre observation.
>
> Thanks,
> Jaaved
>
> --
> Jaaved Mohammed,
> Ph.D. Student of Computational Biology
> Tri-Institutional Training Program in Computational Biology and Medicine
> (Cornell University - Ithaca, Weill Cornell Medical College, and Memorial
> Sloan-Kettering Cancer Center)
>
>
>
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>
>

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to