Hi Aanchal,

Since rendering to pixels inherently loses precision, and the volume of data 
for LD is quite large, our database table uses a lossy compression scheme for 
those values, developed by Daryl Thomas.  This scheme (details below) could be 
reversed to get approximate/binned values for r^2, D' and LOD.  However, it 
would probably be better to use the precise values from the .LD data files that 
were processed into our database representation.  

Unfortunately, the .LD files used to make the hg18.hapmapLd* tables were lost 
in a disk crash.  Fortunately, HapMap has .LD files from a more recent release 
of genotypes -- see files in 
ftp://ftp.hapmap.org/hapmap/ld_data/2009-02_phaseIII_r2/ including 00README.txt 
which explains the .LD file format.  

If you download HapMap's files from that ftp link and make a file "myRsIds.txt" 
that contains your rs#'s of interest, with one rs# per line, like this:

rs12627640
rs240444
rs10432925

Then for each chromosome and population code, a command like this will extract 
the relevant lines of the downloaded file:

zcat ld_${chr}_${pop}.txt.gz | grep -Fwf myRsIds.txt > myLD_${chr}_${pop}.txt

The paired rsIds will be in the 4th and 5th columns, r^2 in the 7th column.


Details of the lossy compression scheme, for the record:

D' and r^2 values in the range of [0,1] are encoded like this:

encodedValue = 'a' + (actualValue * 9)

D' values in the range [-1,0) are encoded like this:

encodedValue = 'A' - (actualValue * 9)

For LOD it's more complicated and involves the absolute value of D' (|D'|):
* if LOD >= 2 and |D'| < 0.5, then encodedValue = 'y' (pink).
* if LOD < 2 and |D'| < 0.99, then encodedValue 'z' (blue).
* otherwise, encodedValue = 'a' + min(9, (LOD - |D'| - 1.5))

After actual values are transformed into alphabetic characters, the alph. 
characters are concatenated into strings in order of the SNPs' appearance after 
the current SNP.  So the Nth character in the concatenated string represents 
the score between the current SNP and the Nth SNP that follows it.  

Hope that helps,
Angie

----- "aanchal sharma" <[email protected]> wrote:

> From: "aanchal sharma" <[email protected]>
> To: [email protected]
> Sent: Friday, June 3, 2011 12:23:41 AM GMT -08:00 US/Canada Pacific
> Subject: [Genome] How to interprate D prime , r^2 and LOD values in LD phased 
> data
>
> Dear Sir /Madam
> 
> For a certain set of SNPs I want to know which are the SNPs in LD with the
> query SNPs and their r^2 values. The output that I am downloading from UCSC
> tables, is giving me a table in which D prime values, r^2 and LOD scores are
> represented in alphabets. I am unable to inertprate the results. Also how
> can I get the list of all other SNPs in LD with my query SNPs?
> Waiting for the soon reply.
> 
> Regards
> Aanchal
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to