Hello Wei,

We have put together some instructions for generating the data. This 
analysis will require some environment set up first:

1) kent source downloaded and compiled
http://genome.ucsc.edu/FAQ/FAQdownloads.html#download27
README instructions for compiling are in the download

2) public mySQL server setup (local ~/.hg.conf)
http://genome.ucsc.edu/FAQ/FAQdownloads.html#download29
more about .hg.conf is in the source READMEs (above)

3) download of the dog genome's 2bit sequence file
http://genome.ucsc.edu/FAQ/FAQdownloads.html#download1
ftp to downloads server, but instead of going into
goldenPath, go into gbdb instead
gbdb/canFam2/canFam2.2bit

Once set up, this is the analysis path:

# obtain the PSLs for the human transmap

hgsql -Ne 'select * from transMapAlnUcscGenes where tName="chr14" and 
tEnd > 11073413 and tStart <11077825' canFam2 | cut -f 2- 
 >transMapAlnUcscGenes.psl

# obtain CDS, first get the ids that are in the alignments and strip off
# the unique suffix (starting with `-')

cut -f 10 transMapAlnUcscGenes.psl |sed 's/-.*$//' >transMapAlnUcscGenes.acc

# then get the CDS from the hgFixed database using these ids, for instance:

hgsql -Ne 'select * from transMapGeneUcscGenes where id in 
("uc003vml.2", "uc009bcv.1", "uc003vmm.2")' hgFixed 
 >transMapAlnUcscGenes.cds

# convert to genePred format

mrnaToGene -ignoreUniqSuffix -insertMergeSize=0 
-cdsFile=transMapAlnUcscGenes.cds transMapAlnUcscGenes.psl 
transMapAlnUcscGenes.gp

# get fasta of CDS sequence

getRnaPred -cdsOnly -genomeSeqs=/hive/data/genomes/canFam2/canFam2.2bit 
canFam2 transMapAlnUcscGenes.gp all transMapAlnUcscGenes.fa


Hopefully this helps, please let us know if you need more assistance,
Jennifer

---------------------------------
Jennifer Jackson
UCSC Genome Informatics Group
http://genome.ucsc.edu/

On 5/10/10 7:09 AM, Wei Zheng wrote:
> Hi,
>
> I want to study the nucleotide identity of coding region sequences
> between different species (e.g. human and dog). Now I have a list of
> known UCSC genes in human, and found their homologs in dog through the
> transMapAlnUcscGenes. This way I can get fasta sequences for the
> entire gene regions (including introns) for both species. But how can
> I obtain only the CDS sequences for such comparisons? Thanks a lot.
>
> Wei
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to