I used standalone blat which you can download
and compile and use for free for non-commercial uses.
I used the program faFrag to extract a portion of the chr1.fa
fasta file. One could probably find its equivalent
somewhere, write one of your own, or just compile
the utility from the kent cvs source.
Despite the fact that this worked better for your LINE example,
it is not a general-purpose solution really.
However if you are desperate, I suppose you could
make some scripts to automate chunking your genome
into smaller pieces (faSplit), standalone-blatting each,
and then re-assembly the results with various
psl-related utilities (pslSort, pslCDnaFilter, etc.)
That's not to say that we recommend this.
I was doing it just to demonstrate the point
that blat can find stuff, as long as it's not
too repetitive.
The more repetitive the element and the more hits there
are, the more you will have to split the genome into
smaller pieces so that no single piece has enough
hits to trigger the tile-over-use limit.
I think the general rule of thumb is that
as soon as you see blat producing hundreds of hits,
you know you have a repetitive element,
and that BLAT is not guaranteed to exhaustively
give you all hits genome-wide.
If your element aligns to just a handful of places,
then probably you can be reasonably sure that
that's all there is.
-Galt
On Wed, 5 Nov 2008, Yuan Jian wrote:
I ran blat against this portion of chr1.
This worked fine and turned up 32 hits including
the one that you were expecting.
How did you do this in blat????
It is difficult to index the entire genome
to its full depth when some times would be
hit millions of times. Handling repeats exhaustively
is not one of BLAT's goals.
Jim Kent is working on a new short-sequence aligner program
that might help here, but it is still in development.
Repeat
RepeatMasker Information
Name: L1PA7
Family: L1
Class: LINE
SW Score: 16289
Divergence: 8.7%
Deletions: 0.8%
Insertions: 0.8%
Begin in repeat: 3132
End in repeat: 4244
Left in repeat: 1902
Position: chr1:247183181-247184298
Band: 1q44
Genomic Size: 1118
Strand: -
-Galt
On Tue, 4 Nov 2008, Yuan Jian wrote:
Hi there,
?
I got a seuquence
TGAAAACTGGCACAAGACAAGGATGCC
located in chr1:247183338-247183364 strand-.
?
but when I blat it. I can not find that location for the sequence.
but I found it in other loci of chr1 and strand-:
browser details YourSeq?????????? 27???? 1??? 27??? 27
100.0%???? 1?? -? 219139486 219139512???? 27
browser details YourSeq?????????? 27???? 1??? 27??? 27
100.0%???? 1?? -? 192113590 192113616???? 27
browser details YourSeq?????????? 27???? 1??? 27??? 27
100.0%???? 1?? -? 189295101 189295127???? 27
browser details YourSeq?????????? 27???? 1??? 27??? 27
100.0%???? 1?? -? 156553131 156553157???? 27
browser details YourSeq?????????? 27???? 1??? 27??? 27
100.0%???? 1?? -? 156665481 156665507???? 27
?
can you please tell me why?
?
thanks
?
Yu
_______________________________________________
Genome maillist - [email protected]
http://www.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist - [email protected]
http://www.soe.ucsc.edu/mailman/listinfo/genome