Hi, Manisha!

Thanks for providing exact command-lines.

Your example query sequence is a repeat sequence,
and repeat-masker marks it as being in a SINE repeat region.
BLAT will never be able to find every possible alignment
for a highly repetitive sequence in the genome exhaustively,
it's just not designed for that job.

Answers interspersed below:

On 12/02/11 08:15, Brahmachary, Manisha wrote:
> Hi I have a question regarding BLAT on command line and BLAT on web browser.
> 
> I am getting different results when I run the same query sequence in the 
> following way:
> 
> 
> 1.       Query sequence run in command line BLAT  against hg18 2 bit format
> 
> blat -stepSize=5 -repMatch=2253 -minScore=0 -minIdentity=0 
> ~/Projects/Tandem_Repeats/NS_SNP_assoc/BLAT_results_Real_NanostringData/SNP_Genotype/All_Pop_Seperate/Correlation/Correlation_Results_Summary/OTHERS/Human.2bit
>  /home/brahmm01/Projects/BLAT/Query_Sequence.txt Results_blat_hg18.psl
> 
> 
When I do this it matches the output of the browser hgBlat pretty well.
Especially the highest hit is exactly the same on chr14.

I hope that you are aware that the browser for users' benefits displays 
coordinates as 1-based.  But internally in nearly all files and 
databases, it stores coordinates as 0-based, and in particular
ranges are half-open coordinates, e.g. the first 100 bases are
chr14 0 100
where 0 is the starting coordinate, and 100 is not included in the 
range.  This simplifies many operations involving coordinates.

http://genome.ucsc.edu/FAQ/FAQtracks.html#tracks1

> 
> 
> 
> 2.        Query sequence run in command line BLAT  against only chr14 in 2 
> bit format ( I am considering only chr 14 as the hits on this chromosome are 
> of my prime interest)
> 
> blat -stepSize=5 -repMatch=2253 -minScore=0 -minIdentity=0 
> ~/Projects/Tandem_Repeats/NS_SNP_assoc/BLAT_results_Real_NanostringData/SNP_Genotype/All_Pop_Seperate/Correlation/Correlation_Results_Summary/OTHERS/chr14.2bit
>  /home/brahmm01/Projects/BLAT/Query_Sequence.txt 
> /home/brahmm01/Projects/BLAT/Query_blat_chr14.psl
> 
> 
When you run only chr14, instead of all of hg18, then some of the tiles 
will not exceed the repMatch, and this may allow for further seeds to 
match and extend in repeated regions.  You can at least control for
this to get consistency by using a .ooc file.  This is created by running:


blat -stepSize=5 -repMatch=2253 -makeOoc=11.5.ooc 
~/Projects/Tandem_Repeats/NS_SNP_assoc/BLAT_results_Real_NanostringData/SNP_Genotype/All_Pop_Seperate/Correlation/Correlation_Results_Summary/OTHERS/Human.2bit
 
/dev/null /dev/null


And then when you run blat as a query simply include this parameter:
  -ooc=11.5.ooc

So then the over-used tiles should remain consistent even if you are 
only doing one chromosome.

As it is now, when you reduce to only using chr14 without the .ooc file,
it should pick up a few more alignments (5) that are not in the full 
hg18 blat.

The trouble is because your query sequence is a highly repetitive sequence.
There are similar copies all over the genome.  Some are barely on the
edge of being masked out and may appear or disappear as you change
the parameters.

> 
> 3.       Query sequence run in command line BLAT  against only chr14 in 2 bit 
> format with output format as blast output
> 
>          blat -stepSize=5 -repMatch=2253 -minScore=0 -minIdentity=0 
> /home/brahmm01/Projects/BLAT/chr14.2bit 
> /home/brahmm01/Projects/BLAT/Query_Sequence.txt out=blast8 
> /home/brahmm01/Projects/BLAT/Results_blat_chr14.blast.txt

Blast output is a different beast.  After aligning the alignment is 
split up so that each exon is separately output.

However, your best match appears in the output on chr14 just fine.

> 
> 5.       Query sequence run on the web browser
> 
> 
> 
> I am not sure why I am getting different results in terms of the start and 
> end positions of the target region.
> 
> 

Hope that helps!
-Galt

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to