Hi, Fahim! 1. You need to use -stepSize=5 with your blat command for short sequences.
http://genome.ucsc.edu/FAQ/FAQblat.html#blat8 This says that the calculations of minimal size for tileSize and stepSize apply to both the standalone blat as well as gfClient/gfServer. 2. If an intron appears in the middle of your 25bp very short sequence, it makes it even shorter, and thus BLAT may not find it. Because the HG Focus is an expression array, you may need to use an RNA database instead of a dna database in order for blat to find all probes since then the probes won't be split by introns as is the case for genomic dna. There are affymetrix files with probe TARGET sequences which are much longer (avg 2kb) and easier to align and locate on the genome. 3. You may need to supply other switches to pslReps to over-ride the defaults. For instance: -minNearTopSize=N Minimum size of alignment that is near top for alignment to be kept. Default 30. This would eliminate all your 25bp probes. Good luck! -Galt On 12/06/10 08:36, Fahim M wrote: > Hi > I am using standalone BLAT for mapping the affymetrix probes onto the > genomes of human. I am using affymetrix HG-Focus array probes. It is having > over 97,349 probes and 800 control probes, most of them 25 bases long. > > I used the following command: > ./blatDir/blat -noHead -fine -minScore=25 chrFile queryFile tmpPslFile > (I am doing blat one chromosome at a time and concatenated all psl files > generated. Then I sorted the concatenated file using linux sort command > > sort -k10,10 -k12,12n concatenatedPslFiles.psl > sorted.psl ) > > Out of 97,349 probes only around 51,000 (Just 50%) mapped to the genome. > > > The following link does talk about using blat for short sequence with > maximum sensitivity. > http://genome.ucsc.edu/FAQ/FAQblat.html#blat8 > It also talks about using stepSize and the formula to find the shortest > query size that guarantee a match in gfServer/gfClient. I could not use > these parameters as it is not an option in blat command. > > Q: What command should I use for standalone blat so that I get at least one > mapped region onto genome. > > > Next, I used pslReps command as below to find the best alignment. > > ./blatDir/pslReps -minCover=0.3 -minAli=0.95 -nearTop=0.005 -nohead > sortedPslFile out.psl out.psr > > To my surprise, I did not find any entry in the best alignment(out.psl) > file. out.psr does contain around 36,000 entries. > > I am confused what to do in this case and could not figure out where am I > wrong? Please Help. > > Thanks and regards > -Fahim > > > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
