Hello Wei, We have put together some instructions for generating the data. This analysis will require some environment set up first:
1) kent source downloaded and compiled http://genome.ucsc.edu/FAQ/FAQdownloads.html#download27 README instructions for compiling are in the download 2) public mySQL server setup (local ~/.hg.conf) http://genome.ucsc.edu/FAQ/FAQdownloads.html#download29 more about .hg.conf is in the source READMEs (above) 3) download of the dog genome's 2bit sequence file http://genome.ucsc.edu/FAQ/FAQdownloads.html#download1 ftp to downloads server, but instead of going into goldenPath, go into gbdb instead gbdb/canFam2/canFam2.2bit Once set up, this is the analysis path: # obtain the PSLs for the human transmap hgsql -Ne 'select * from transMapAlnUcscGenes where tName="chr14" and tEnd > 11073413 and tStart <11077825' canFam2 | cut -f 2- >transMapAlnUcscGenes.psl # obtain CDS, first get the ids that are in the alignments and strip off # the unique suffix (starting with `-') cut -f 10 transMapAlnUcscGenes.psl |sed 's/-.*$//' >transMapAlnUcscGenes.acc # then get the CDS from the hgFixed database using these ids, for instance: hgsql -Ne 'select * from transMapGeneUcscGenes where id in ("uc003vml.2", "uc009bcv.1", "uc003vmm.2")' hgFixed >transMapAlnUcscGenes.cds # convert to genePred format mrnaToGene -ignoreUniqSuffix -insertMergeSize=0 -cdsFile=transMapAlnUcscGenes.cds transMapAlnUcscGenes.psl transMapAlnUcscGenes.gp # get fasta of CDS sequence getRnaPred -cdsOnly -genomeSeqs=/hive/data/genomes/canFam2/canFam2.2bit canFam2 transMapAlnUcscGenes.gp all transMapAlnUcscGenes.fa Hopefully this helps, please let us know if you need more assistance, Jennifer --------------------------------- Jennifer Jackson UCSC Genome Informatics Group http://genome.ucsc.edu/ On 5/10/10 7:09 AM, Wei Zheng wrote: > Hi, > > I want to study the nucleotide identity of coding region sequences > between different species (e.g. human and dog). Now I have a list of > known UCSC genes in human, and found their homologs in dog through the > transMapAlnUcscGenes. This way I can get fasta sequences for the > entire gene regions (including introns) for both species. But how can > I obtain only the CDS sequences for such comparisons? Thanks a lot. > > Wei > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
