Hi Xianjun,

Reading the code is not a bad idea -- in fact, it's the only way I can 
answer your question -- but now that I have read it, the short answer 
is that we translate the coordinates of aligned blocks (links) only, 
not gaps, because we don't have enough information to reliably 
translate coordinates in the middle of gaps.  The "part of chain that 
is in this window" is just the aligned links in the window.

If you would still like to see the code, start in 
kent/src/hg/hgc/hgc.c, function chainToOtherBrowser().  First it calls 
chainSubsetOnT() to chop the chain to the portion in the window, by 
target coordinates.  Then it calls qChainRangePlusStrand(), which 
returns the overall query start and end of the chopped chain.

chainSubsetOnT() is in kent/src/lib/chain.c.  First it finds the first
target link whose tEnd is greater than the start of the window.  Then
as chainFastSubsetOnT() it clips the start of the first aligned link
to the start of the window if necessary, finds the last link in the
window and clips that if necessary.  It builds a new chain from the
processed links.  Score is scaled by the ratio of the new span to the
original span -- maybe crude, but the best we can do without rescoring
the new chain, which would be too computationally intensive.  

I guess you could do your own linear scaling of partial-gap sizes when
there are gaps at the start or end of your window.  That will give you
coordinates of sequences that blastz was not able to align, but that
are adjacent to aligned sequences.  Whether those sequences should be
paired or not, who knows -- I guess that's why we translate by aligned
blocks only.  

Hope that helps,
Angie


On Thu, 11 Jun 2009, Xianjun Dong wrote:

> Hi,
> 
> I am curious how UCSC website locate the position (in species B) 
> corresponding to the part of chain that is in a window (in species A).
> 
> To explain my question clearly, I have an example here. For instance, if 
> you open the zebrafish chain alignment track in the following human 
> locus page:
> http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&position=chr11:31787916-31796585
> You can see there are two chains under the region. Clicking one of them, 
> we will see another page:
> http://genome.ucsc.edu/cgi-bin/hgc?hgsid=134848741&o=31083313&t=33335216&g=chainDanRer5&i=124&c=chr11&l=31787915&r=31796585&db=hg18&pix=900
> On this page, there is a link for "Open Zebrafish browser 
> <http://genome.ucsc.edu/cgi-bin/hgTracks?db=danRer5&ct=&position=chr25%3A15181364-15184384>
>  
> at position corresponding to the part of chain that is in this window." 
> Clicking that will exactly bring us to the zebrafish 
> region(chr25:15181364-15184384), which is exactly the region I am 
> interested to get by script. It seems that you have the trick to cut the 
> chain accurately.
> 
> OK, my question is: how do you make this done so exactly and so quickly? 
> Esp. when there are double lines at the end of the query region, which 
> represent more complex gaps involving substantial sequence in both species.
> 
> This might be answered simply by "UCSC are using a super power 
> server....and well-structured databases".
> 
> Well, I guess this might be one of the main reasons. But, beside that, I 
> am curious UCSC must be using some tricky way to get it quickly. I've 
> looked into the Chain (and chainLink) table. Only alignable blocks are 
> stored there, which means it's no problem to get the corresponding 
> position if the start/end position of the query reside inside the 
> blocks. But how about if it resides in the complex gap (double line in 
> the chain track)? How can you know exactly where its corresponding 
> position in zebrafish, for example?
> 
> Can I know the script, or the way how you make it?
> 
> Hope I explain my question clearly
> 
> Thanks
> 
> 
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to