Hi Deli, You can use the Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables) to retrieve only regions of the multiple alignment that you are interested in. Select the canFam2 assembly and then:
group: Comparative Genomics track: Conservation table: multiz4way If you have 1,000 regions or fewer, at this point you can hit the "define regions" button and enter the regions you want to retrieve. Next, leave "output format: MAF - multiple alignment format" selected, and hit "get output." You should see portions of the alignment file that correspond to your regions. If you have more than 1,000 regions, you can create a custom track of your regions (in BED format) from this page: http://genome.ucsc.edu/cgi-bin/hgCustom, then choose the multiz4way table in the Table Browser and create an intersection with your new custom track. On the intersection page, you would choose "Base-pair-wise intersection (AND) of Conservation and User Track." I have one bit of additional input for you from one of our engineers about considering all differences with human/mouse/rat errors in the canFam2 reference assembly: In your chr38 indel example, indeed the reference's additional "T" seems like it would cause a frameshift and early stop, assuming that dog has a gene where other species' genes have aligned. However, it seems unlikely that absolutely all reference variants that differ from human/mouse/rat are reference errors; especially when considering SNPs, couldn't some of those differences be caused by true variation? If you have further questions, please feel free to contact us again at [email protected]. -- Brooke Rhead UCSC Genome Bioinformatics Group On 07/24/11 12:47, Deli Liu wrote: > Hi, > > I mapped next-generation sequencing data of a dog genome to the dog > reference genome (canFam2), and search for the SNP and indel. But I found > that some SNPs and indels are due to the mistakes in dog reference genome, > which are different from the human or mouse genome. For example: > > I found a indel of my input dog genome by comparing the canFam2 reference > genome, and the reference is T in this base pair region, but the input has > one base pair deletion in this region: > > chr38:8,242,507-8,242,507 T - > > However, I found that the human, mouse and rat have no T in this base pair > from conservation in browser. So there may be a mistake in this base pair > region of canFam2 reference genome. > > Now I want search for all my SNP and indel results back to the canFam2 > reference genome, and find out which are the real changes, and which are due > to the canFam2 reference mistake by comparing the dog to the human and mouse > genomes. And I download the multiple alignments files from dog canFam2, but > it is very hard to output all the possible mistakes related to my SNP and > indel regions. So is it possible I can get the conservation from some > specific regions (my SNP and indel)? > > Thanks a lot. > > > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
