Hi Kyle, The best place to find out how UCSC did the processing for a track is in what we call the "makedoc". There is one per assembly containing the track build processing. Makedocs are located in the kent source tree.
kent/src/hg/makeDb/doc/hg18.txt kent/src/hg/makeDb/doc/hg19.txt These documents can also be browsed online at: http://genome-test.cse.ucsc.edu/~kent/src/unzipped/hg/makeDb/doc/ (try the link tomorrow, access to genome-test is limited right now) All information is in the makedoc or the track description page or the online FAQ, but in summary: For the flanking sequence question: - 1 Specific for this track, the flanking sequence for both assemblies is at gbdb/hg18/snp/snp130.fa and can be downloaded using ftp to the downloads server. (the flanking sequence data was the same for hg18 & hg19, so we did not duplicate it.) For the BLAT questions: - 2a The SNPs come from dbSNP - both the novel content and reference genome position - this is not parsed from BLAT output - 2b IUPAC characters are declared on the SNP track's description page. Section = Re-alignment of the SNP's flanking sequences to the genomic sequence dbSNP flanking sequences and observed allele code for rsXXXXX: (Uses IUPAC ambiguity codes) Go into the track description for the link or directly see: http://genome.ucsc.edu/goldenPath/help/iupac.html - 3 BLAT documentation comes with the software when download, but is also online here: http://genome.ucsc.edu/goldenPath/help/blatSpec.html This should help you get going again, but please let us know if you need more help Kyle, Jennifer --------------------------------- Jennifer Jackson UCSC Genome Bioinformatics Group http://genome.ucsc.edu/ On 3/3/10 3:33 PM, Kyle Tretina wrote: > To whom it may concern, > > I wish to batch automate the re-alignment of all SNP flanking sequences on > chromosome 16 from ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/rs_fasta/ > and align that to the genomic download for chr16 at > http://hgdownload.cse.ucsc.edu/downloads.html.<http://hgdownload.cse.ucsc.edu/downloads.html>I > have a couple of questions: > > 1) Is there anywhere that I can get the SNP flanking sequences customized > (i.e. only get the positively selected SNP sequences if I have a list of all > of the rs numbers for those SNP's, and the same for the non-positively > selected)? > > 2) I did a test BLAT run using only a portion of the queries in the SNP > sequences and I then converted it to the human readable output using > pslPretty. However, I was wondering > a) How I could identify the base to which the SNP referrs to in the > pslPretty output, as it is on the website (see example below)? Below it > seems to be identified by a "G" for the genomic sequence (?) and an (R) for > the reference sequence (?) > b) I am assuming that the output that I received for this test run was in > order from highest score to lowest for each query. Is there any way to > modify the parameters so that only the result with the highest score is in > the output file? Is this what happens on the ucsc website? > > 79237487 > AAACAAACAGCTTGTTTGTGGTTCGTCCTGAAATCCTCCCTGCTCACAAAACAGCCAGCTACTTGGTTTTCTAAAAGACGTAATTTTGCAGGCAGACTTC > 79237586 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 00000201 > AAACAAACAGCTTGTTTGTGGTTCGTCCTGAAATCCTCCCTGCTCACAAAACAGCCAGCTACTTGGTTTTCTAAAAGACGTAATTTTGCAGGCAGACTTC > 00000300 > > *79237587 G 79237587 > > 00000301 R 00000301 > > *79237588 > TAGAGCCATTCTGTGCAGAAGAAGGGAAGGGAGAAGCTGTTTGTTTTACCTGTAGTATGAAGATATTCTTTGCGCTGTTAGAACTGAGCTCATTAATTCT > 79237687 > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 00000302 > TAGAGCCATTCTGTGCAGAAGAAGGGAAGGGAGAAGCTGTTTGTTTTACCTGTAGTATGAAGATATTCTTTGCGCTGTTAGAACTGAGCTCATTAATTCT > 00000401 > > 3) Finally, I was wondering if there were any documents/descriptions online > of common modifications of BLAT and pslPretty. > > > > > I apologize for the length of this email. I am an undergraduate > bioinformatics intern, and so I have to ask for your patient in helping me. > > Kyle Tretina > Junior > Wheaton College > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
