Hi Tom,

Thank you for reporting your concerns with regards to orthologous
genes. These issues will be added to our list of things for our
engineers to look into.

Vanessa Kirkup Swing
UCSC Genome Bioinformatics Group



---------- Forwarded message ----------
From: thomas pringle <[email protected]>
Date: Mon, Jan 23, 2012 at 4:39 AM
Subject: [Genome] 3 more bugs in "Orthologous Genes in Other Species"
To: [email protected]
Cc: Donna Karolchik <[email protected]>


1. See attached graphic. If there is no ortholog to human for a given
mouse or yeast gene, how can there be links to a human gene in Gene
Details and Gene Sorter? These links were correct so for some reason
there was a failure to find the ortholog and browser location (of
course those are given at both the gene details and gene sorter
links).

I think the problem here is New UCSC Genes vs Old UCSC genes and
certain old uc numbers no longer corresponding to anything. That may
mean the underlying tables for Orthologous Genes in Other Species"
were never updated to New UCSC Genes. That is, Gene Sorter and this
table aren't on the 'same page'.

Example:
Human Gene RPL11 (uc001bhk.3)
S. cerevisiae Gene RPL11A (YPR102C)




2. I am seeing hundreds of cases where a single human gene has
multiple yeast 'orthologs'. This contradicts our claim that we are
using best reciprocal blastp. That would put orthologs in a 1-1
relationship. The real problem is yeast has a lot of duplicated
proteins that are nearly identical to each other. Humans also have a
lot of duplicated proteins that are nearly identical to each other and
homologous to the yeast set. It is very problematic to match these up.

yeast    human
YGR214W  uc003cjr.2
YLR048W  uc003cjr.2

YHR216W  uc003vmx.2
YLR432W  uc003vmx.2
YML056C  uc003vmx.2

YBL068W  uc004cvb.2
YER099C  uc004cvb.2
YHL011C  uc004cvb.2

sacCer3.sgdGene
sacCer3.hgBlastTab fields

3. A prose problem deep in the tables:

http://genome-test.cse.ucsc.edu/cgi-bin/hgGene?hgsid=3502800&hgg_do_otherProteinAli=on&hgg_otherPepTable=mm9.knownGenePep&hgg_otherId=uc009mke.1

"The single best exon  chains extending over more than 60% of the
query protein were included. Exon chains that extended over 60% of the
query and matched at least 60% of the protein's amino acids were also
included."

That should read:
"The single best exon  chains extending over more than 60% of the
query protein were included. Other exon chains that extended over 60%
of the query and matched at least 60% of the protein's amino acids
were also included."


3. More unclear prose:

Schema for Human Proteins - Human Proteins Mapped by Chained tBLASTn

"ID (including gaps) 97.9%, coverage (of both) 100.0%,..."

I don't know what coverage of both could possiblyh. We have qStart,
qEnd, tStart, tEnd which make sense. The match starts at position such
and such in the query and ends a ways later. That match corresponds to
a position range in the target. I don't see how or why that should be
shortened.

YAL012W 8       392     17      397
YAL061W 2       289     10      271
YAL060W 9       237     17      220
YAL058W 49      443     82      466
YAL054C 66      707     37      692
YAL048C 2       623     1       591
YAL046C 19      107     13      102





_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to