Hello, I have used the UCSC table browser to download the complete snp130 table as a tab-separated text file. Since I am only interested in single nucleotide substitutions, where one nucleotide is replaced by another one, I selected from this file only the lines for which the 'class' field is equal to 'single'. But when I take a look at the resulting subset of entries, I realize that there are some lines that should not appear in the filtered file.
Here is a copy/paste of some lines for which I need additional explanations : #bin chrom chromStart chromEnd name score strand refNCBIrefUCSC observed molType class valid avHet avHetSE func locTypeweight 212 chr1 146668316 146677849 rs2137935 0 + ( 9533bp insertion ) ( 9533bp insertion ) C/G genomic single unknown 0 0 unknown range 3 585 chr1 126656 126673 rs72497839 0 - GCTCGGGCTGACCTCTC GCTCGGGCTGACCTCTC A/C genomic single unknown 0 0 unknown range 1 585 chr1 92822 92822 rs4317776 0 - - - A/C genomic single unknown 0 0 unknown between 3 586 chr1 155165 155165 rs1974329 0 - - - G/T genomic single unknown 0 0 unknown between 3 586 chr1 148894 148895 rs4111311 0 - C C G/T genomic single unknown 0 0 unknown rangeDeletion 3 For the first two lines, I think I get the point: since a group of nucleotides is replaced by a single one, then the entry is given the class 'single' but the 'locType' field is set to 'range', because it is a range of nucleotides which is actually replaced by a single one. So the first two lines should be correct. However for the third and forth lines, I do not understand why the class is 'single' since apparently they are insertions and should have the 'insertion' class in the database. And finally, for the fifth line, I do not understand why the 'locType' field is 'rangeDeletion' since apparently it is a single nucleotide substitution and the value of 'loctype' should be 'exact'. Are there minor mistakes in the snp130 table or did I miss something about the classification of the entries ? And consequently, if I want to extract only single nucleotide substitutions, where a single nucleotide is replaced by another single nucleotide, should I select entries for which the 'class' and 'locType' fields are respectively equal to 'single' and 'exact' ? Or is there a possibility that undesired entries can pass this filter ? Thank you for reading me Best regards David -- David Gacquer, Ph. D. IRIBHM - Universite Libre de Bruxelles Bldg C, room C.4.117 ULB, Campus Erasme, CP602 808 route de Lennik B-1070 Brussels Belgium Phone: +32-2-555 4187 Fax: +32-2-555 4655 E-mail: dgacquer at ulb.ac.be _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
