You have run into the most common question: Half-open coordinates used in files and tables versus closed positions as displayed by the browser for users:
see http://genome.ucsc.edu/FAQ/FAQformat.html#format1 >> For a short sequence (7nt) [...] >> The following example shows the problem. >> >> content of the inputfile containing one human seq location on hg19: >> >> chr11 120899760 120899766 pos1 1 + This actually describes a 6 nt sequence with the first base in the chromosome being 0 and the last base in your chromEnd (120899766) not being included. What you probably wanted to put in your bed file is: chr11 120899759 120899766 pos1 1 + which would be 7 bases long. So the minMatch=1 liftOver option does work. -Galt 6/1/2011 10:20 PM, Maximilian Haussler: > Hi Mohsen, > > I see, my solution was not correct. I actually don't understand why > minMatch=1 doesn't give you the right answer and am forwarding your > question to the mailing list again... > > cheers > Max > > ---------- Forwarded message ---------- > From: Mohsen Sabouri<[email protected]> > Date: Wed, Jun 1, 2011 at 6:18 PM > Subject: RE: [Genome] finding conserved positions with 100% seq. > identity on a different species genome > To: Maximilian Haussler<[email protected]> > > > Hi Max, > > Thanks for your help. I donwloaded pslCDnaFilter. It takes BLAT > outputs that are in psl format. However, I am using the liftover for > genome conversion between human and mouse where the outputs are in BED > format. Is there anyway that I can pipe the output of liftover(batch > coordinate conversion) to pslCDnaFilter or any other filter that can > do the job. > > Many thanks! > > Mohsen Sabouri, PhD > The Scripps Research Institute > 10550 N. Torrey Pines Road > La Jolla, CA 92037 > ________________________________________ > From: Maximilian Haussler [[email protected]] > Sent: Tuesday, May 31, 2011 8:35 PM > To: Mohsen Sabouri; genome > Subject: Re: [Genome] finding conserved positions with 100% seq. > identity on a different species genome > > Hi Mohsen, > > the pslCDnaFilter program has an option -minId which you could set to > 1.00 to remove the non-identical alignments. Does this solve your > problem? > > cheers > Max > > > > > > > On Tue, May 31, 2011 at 10:11 PM, Mohsen Sabouri<[email protected]> wrote: >> Hi >> >> For a short sequence (7nt) in human genome assembly, hg19, I want to find >> the corresponding 7nt sequence in Mouse assembly mm9, with 100% sequence >> Identity to my human 7nt sequence (assuming its 100% conserved). >> I am using liftover. For some of my human sequences liftover points to >> locations on Mouse that are not 100% identical to my original human >> sequence. Is there a parameter in liftover that can be adjusted for this. >> >> The following example shows the problem. >> >> content of the inputfile containing one human seq location on hg19: >> >> chr11 120899760 120899766 pos1 1 + >> >> the sequence is : CACTTTA >> >> liftover command line used: >> >> ./liftOver -minMatch=1 -multiple -minSizeT=7 -minSizeQ=7 -bedPlus=6 >> inputfile hg19ToMm9.over.chain outputfile unmapped >> >> content of the outputfile produced by liftover containing corresponding >> conserved mouse coordinates in mm9 assembly is: >> >> chr9 42275345 42275351 pos1 1 - >> >> when I plug the new coordinates in the genome browser, mm9, it shows >> content of this sequence as: >> >> TAAGGAG >> >> which is not 100% match with my original human seq. >> >> Ideally if there is no 100% identity, then the outputfile should be empty. >> >> Is there any way to fix this? >> >> Many thanks! >> >> >> Mohsen Sabouri, PhD >> The Scripps Research Institute >> 10550 N. Torrey Pines Road >> La Jolla, CA 92037 >> _______________________________________________ >> Genome maillist - [email protected] >> https://lists.soe.ucsc.edu/mailman/listinfo/genome >> > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
