Hello, The version of BLAT on our web site is slightly different from the one available for free download. It is recommended to double check that you are using masked genomic if you want to directly compare to web BLAT. I think this is the major cause of the differing number of alignments. This is why:
I ran the query against the human genome (the one I am assuming you are using) using web BLAT and got the same six hits as a result. The best hit chr17:55,925,487-55,925,588 is significantly better than the others. From bases 26 to125 the match is identical with exception of two gaps around bases 60-70. This best match location does not have overlap with any repetitive track annotations or human chained self alignments. It is in the intron region of two transcripts in the UCSC Genes track (high quality from RefSeq). The second best match: chr19:58,193,657-58,193,721 represents less sequence, bases 57 to126, but at a fairly high identity. This makes me think the alignment if finding a region of (partial) gene duplication since it also has some overlap with human chained self alignments. It is in the intron region of a transcript in the UCSC Genes track (the gene is annotated as non-coding). The third (bases 29-64) and fourth (bases 58-82) alignments cover at slightly weaker identity. No gene association. No repeat association. The fifth (bases 28-51) and sixth (bases 80-102) alignments cover at identities as good or better than the third and fourth alignment, but both are in genomic regions that overlap with very weak LINE repeat elements (Repeat Masker tracks). No gene association. Based on this data, my initial guess is that you are using an unmasked version of the genome for BLAT. The web version uses masked genomic. The bases represented by the third-sixth alignments may be capturing repeat hits. In addition, it seems very likely that the small fragment at the beginning of the sequence (absent from all web BLAT alignments: bases 1-26 or so) is capturing repetitive hits, since if I scroll along the genomic to the right of the first alignment (because the alignment is to the negative strand) where these 26 bases "should" align, up come very strong repeat masker matches to the genomic. I also looked downstream to the right and there are more strong repeat matches. Your sequence falls in a small window between two strong SINE elements annotated by repeat masker. I hope this helps! Jennifer Jackson UCSC Genome Bioinformatics Group Mittal, Vinay K wrote: > Hi, > I was running local blat on my mac osx server for the following sequence: > >> seq >> > TAGTAGTAGACTAATAACAATAGTAAGAAAAATATTGTTTAATGTATGTATACAATTTATATGGTTTCACAAATCTGTTTAGATGTGTGTTTCAGGCAATTTCTTATTAAAGTTTTTGCTACCTTA > > I used the same command line as mentioned on FAQ page to duplicate web BLAT. > > >From the local blat I got 86 hits on human genome while BLAT on UCSC genome > >website shows only 6 hits for the same. > Does web based BLAT (on UCSC website) filter out hits based on some criteria?? > > Thanks. > > Sincerely, > V. > > > _______________________________________________ Genome maillist - [email protected] http://www.soe.ucsc.edu/mailman/listinfo/genome
