Hi Suganthi,

A follow-up: I have remapped the HGDP SNPs to hg18 based on snp130 coordinates 
(instead of snp129) and snp130 reference genome alleles (to normalize HGDP 
alleles to forward reference strand, and to discard ~45 SNPs with 
non-single-base reference alleles).  I have made the file temporarily available 
here:

http://genome-test.cse.ucsc.edu/~angie/hg18.hgdpGeo.tab

And you can view the updated track in the genome-test.cse.ucsc.edu genome 
browser, with the caveat that it is our test server and both software and data 
may be unstable.  

Thank you for bringing this to our attention!  Please contact the list again if 
you have any more questions.  

Angie

----- "Angie Hinrichs" <[email protected]> wrote:

> From: "Angie Hinrichs" <[email protected]>
> To: "Suganthi Bala" <[email protected]>
> Cc: [email protected]
> Sent: Wednesday, September 15, 2010 11:36:10 AM GMT -08:00 US/Canada Pacific
> Subject: Re: [Genome] HGDP SNP data
>
> Hi Suganthi,
> 
> You're right -- these were not corrected for strand, and the schema
> description is incorrect.  I will revisit the HGDP data files and see
> if there's a way to identify and fix these cases.
> 
> An expedient possibility, if you are adept at Perl or some other
> programming language: read in genome fasta sequence, discard newlines
> etc, and store into large strings (possibly one chrom at a time, since
> input is sorted; read chrom seq each time a new chrom appears).  For
> each SNP, look up the reference base at the given coord (substr).  If
> neither allele matches the ref, but the ref does match a complemented
> allele (I guess it had better), replace the given allele with the
> complemented allele.
> 
> This would leave some ambiguity of ancestral vs derived if the dataset
> included C/G or A/T SNPs, but it doesn't (by design of the Illumina
> assay, pers. comm. Devin Absher).
> 
> Sorry for the inconvenience,
> Angie
> 
> 
> ----- "Suganthi Bala" <[email protected]> wrote:
> 
> > From: "Suganthi Bala" <[email protected]>
> > To: [email protected]
> > Sent: Tuesday, September 14, 2010 7:50:28 PM GMT -08:00 US/Canada
> Pacific
> > Subject: [Genome] HGDP SNP data
> >
> > Hi,
> >
> > This pertains to the data that I downloaded for HGDP SNPs via the
> Table
> > Browser for HG18 build. It appears that the SNPs are not always
> reported
> > with respect to the forward strand of the reference genome even
> though that
> > is what the table schema indicates. For eg, the following SNPs:
> rs2296441,
> > rs12782963, rs4758443 etc.
> >
> > Is it possible that it was mistakenly not corrected for strand
> orientation?
> > If yes, is it possible to get a fixed file quickly? Thanks.
> >
> > Best,
> > Suganthi Bala
> > Yale University
> > _______________________________________________
> > Genome maillist  -  [email protected]
> > https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to