Hi Kyle,

The best place to find out how UCSC did the processing for a track is in 
what we call the "makedoc". There is one per assembly containing the 
track build processing. Makedocs are located in the kent source tree.

kent/src/hg/makeDb/doc/hg18.txt
kent/src/hg/makeDb/doc/hg19.txt

These documents can also be browsed online at:
http://genome-test.cse.ucsc.edu/~kent/src/unzipped/hg/makeDb/doc/
(try the link tomorrow, access to genome-test is limited right now)

All information is in the makedoc or the track description page or the 
online FAQ, but in summary:

For the flanking sequence question:

- 1 Specific for this track, the flanking sequence for both assemblies 
is at gbdb/hg18/snp/snp130.fa and can be downloaded using ftp to the 
downloads server. (the flanking sequence data was the same for hg18 & 
hg19, so we did not duplicate it.)

For the BLAT questions:

- 2a The SNPs come from dbSNP  - both the novel content and reference 
genome position - this is not parsed from BLAT output

- 2b IUPAC characters are declared on the SNP track's description page. 
Section = Re-alignment of the SNP's flanking sequences to the genomic 
sequence
dbSNP flanking sequences and observed allele code for rsXXXXX:
(Uses IUPAC ambiguity codes)
Go into the track description for the link or directly see:
http://genome.ucsc.edu/goldenPath/help/iupac.html

- 3 BLAT documentation comes with the software when download, but is 
also online here:
http://genome.ucsc.edu/goldenPath/help/blatSpec.html

This should help you get going again, but please let us know if you need 
more help Kyle,

Jennifer

---------------------------------
Jennifer Jackson
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu/

On 3/3/10 3:33 PM, Kyle Tretina wrote:
> To whom it may concern,
>
> I wish to batch automate the re-alignment of all SNP flanking sequences on
> chromosome 16 from ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/rs_fasta/
>   and align that to the genomic download for chr16 at
> http://hgdownload.cse.ucsc.edu/downloads.html.<http://hgdownload.cse.ucsc.edu/downloads.html>I
> have a couple of questions:
>
> 1) Is there anywhere that I can get the SNP flanking sequences customized
> (i.e. only get the positively selected SNP sequences if I have a list of all
> of the rs numbers for those SNP's, and the same for the non-positively
> selected)?
>
> 2) I did a test BLAT run using only a portion of the queries in the SNP
> sequences and I then converted it to the human readable output using
> pslPretty. However, I was wondering
>     a) How I could identify the base to which the SNP referrs to in the
> pslPretty output, as it is on the website (see example below)? Below it
> seems to be identified by a "G" for the genomic sequence (?) and an (R) for
> the reference sequence (?)
>     b) I am assuming that the output that I received for this test run was in
> order from highest score to lowest for each query. Is there any way to
> modify the parameters so that only the result with the highest score is in
> the output file? Is this what happens on the ucsc website?
>
> 79237487 
> AAACAAACAGCTTGTTTGTGGTTCGTCCTGAAATCCTCCCTGCTCACAAAACAGCCAGCTACTTGGTTTTCTAAAAGACGTAATTTTGCAGGCAGACTTC
> 79237586
>           
> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> 00000201 
> AAACAAACAGCTTGTTTGTGGTTCGTCCTGAAATCCTCCCTGCTCACAAAACAGCCAGCTACTTGGTTTTCTAAAAGACGTAATTTTGCAGGCAGACTTC
> 00000300
>
> *79237587 G 79237587
>
> 00000301 R 00000301
>
> *79237588 
> TAGAGCCATTCTGTGCAGAAGAAGGGAAGGGAGAAGCTGTTTGTTTTACCTGTAGTATGAAGATATTCTTTGCGCTGTTAGAACTGAGCTCATTAATTCT
> 79237687
>           
> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> 00000302 
> TAGAGCCATTCTGTGCAGAAGAAGGGAAGGGAGAAGCTGTTTGTTTTACCTGTAGTATGAAGATATTCTTTGCGCTGTTAGAACTGAGCTCATTAATTCT
> 00000401
>
> 3) Finally, I was wondering if there were any documents/descriptions online
> of common modifications of BLAT and pslPretty.
>
>
>
>
> I apologize for the length of this email. I am an undergraduate
> bioinformatics intern, and so I have to ask for your patient in helping me.
>
> Kyle Tretina
> Junior
> Wheaton College
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to