Hello,

I have used the UCSC table browser to download the complete snp130 table 
as a tab-separated text file. Since I am only interested in single 
nucleotide substitutions, where one nucleotide is replaced by another 
one, I selected from this file only the lines for which the 'class' 
field is equal to 'single'. But when I take a look at the resulting 
subset of entries, I realize that there are some lines that should not 
appear in the filtered file. 

Here is a copy/paste of some lines for which I need additional 
explanations :

#bin    chrom    chromStart    chromEnd    name    score    strand    
refNCBIrefUCSC    observed    molType    class    valid    avHet    
avHetSE    func    locTypeweight

212    chr1    146668316    146677849    rs2137935    0    +    ( 9533bp 
insertion )    ( 9533bp insertion )    C/G    genomic    single    
unknown    0    0    unknown    range    3
585    chr1    126656    126673    rs72497839    0    -    
GCTCGGGCTGACCTCTC    GCTCGGGCTGACCTCTC    A/C    genomic    single    
unknown    0    0    unknown    range    1
585    chr1    92822    92822    rs4317776    0    -    -    -    A/C    
genomic    single    unknown    0    0    unknown    between    3
586    chr1    155165    155165    rs1974329    0    -    -    -    
G/T    genomic    single    unknown    0    0    unknown    between    3
586    chr1    148894    148895    rs4111311    0    -    C    C    
G/T    genomic    single    unknown    0    0    unknown    
rangeDeletion    3

For the first two lines, I think I get the point: since a group of 
nucleotides is replaced by a single one, then the entry is given the 
class 'single' but the 'locType' field is set to 'range', because it is 
a range of nucleotides which is actually replaced by a single one. So 
the first two lines should be correct.

However for the third and forth lines, I do not understand why  the 
class is 'single' since apparently they are insertions and should have 
the 'insertion' class in the database.

And finally, for the fifth line, I do not understand why the 'locType' 
field is 'rangeDeletion' since apparently it is a single nucleotide 
substitution and the value of 'loctype' should be 'exact'.

Are there minor mistakes in the snp130 table or did I miss something 
about the classification of the entries ?

And consequently, if I want to extract only single nucleotide 
substitutions, where a single nucleotide is replaced by another single 
nucleotide, should I select entries for which the 'class' and 'locType' 
fields are respectively equal to 'single' and 'exact' ? Or is there a 
possibility that undesired entries can pass this filter ?

Thank you for reading me

Best regards

David

-- 
David Gacquer, Ph. D.

IRIBHM - Universite Libre de Bruxelles
Bldg C, room C.4.117
ULB, Campus Erasme, CP602
808 route de Lennik
B-1070 Brussels
Belgium

Phone: +32-2-555 4187
Fax: +32-2-555 4655
E-mail: dgacquer at ulb.ac.be 

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to