Good Morning Christophe:
The databases for genome assemblies can be seen from our public
MySQL server with the command:
$ mysql -N -A -hgenome-mysql.cse.ucsc.edu -ugenomep -ppassword \
-e "select name from dbDb where active=1;" hgcentral | sort
See also, scripts in the source tree:
http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/product/scripts/
to aid your mirror jobs.
Those databases are the primary data for a genome assembly with all
the annotations built on that genome assembly.
Some annotations require extra tables and databases that are common
across a number of genome assemblies. Hence they are outside any
particular genome database. Some of these external databases have
specific usage rights, see also:
http://hgdownload.cse.ucsc.edu/goldenPath/swissProt/database/README.txt
The extra databases are:
hgcentral - primary database the browser uses to find everything else
also contains dynamic user/session "cart" data
visiGene - virtual microscope for mice sections
sp090821, etc ... - "Swiss-Prot" aka UniProt database
obtained from files at ftp.expasy.org/databases/uniprot/
used in UCSC genes track on various databases
uniprot - the newest version of the Swiss-Prot databases, can simply
be a symlink to the newest sp* database directory
used in UCSC genes track on various databases
go - The Gene Ontology database, obtained from:
http://www.godatabase.org/dev/database/
Used in the UCSC genes track
proteins090821, etc. - a combination of the UniProt data mentioned above
and data from HGNC http://www.genenames.org/
Used in the UCSC genes track and proteome browser
proteome - should merely be a symlink to the most recent proteins090821
database.
Yes, the numbers in sp090821 and proteins090821 are the dates: 2009-08-21
The newest versions of these databases are used in newer annotation tracks.
It is possible some of the oldest ones are used in older genome databases.
To see the correspondence:
mysql -N -A -hgenome-mysql.cse.ucsc.edu -ugenomep -ppassword \
-e "select * from gdbPdb;" hgcentral
--Hiram
[email protected] wrote:
> Hello UCSC's Team,
>
>
>
> We are performing a local installation of the Genome Browser.
>
> During this installation, the downloading of numerous data and databases
> occurs. Most of these databases represent the builds of genomes such as
> mm8, hg18, rn4, tetNig1, panTro2, etc.
>
>
>
> Among these list of databases (115 at this date), there are some
> databases which do not represent genome's build but information about
> molecule or other type of information such as:
>
> VisiGene, uniprot, proteome, go080130, go, hgfixed, hgcentral, mysql,
> proteins040315, proteins050415 , proteins051015, proteins060115, etc.,
> ... , sp040315, sp050415, etc., ..., sp 090821.
>
>
>
> We have guessed the meaning of some of them based on their names, but we
> still have no clue about the meaning and purpose of some of them;
>
>
>
> QUESTIONS:
>
> 1. May you let us know what the databases with names sp090821,
> spXXXXXX (where X are number) along with proteinsXXXXXX represent?
>
> 2. Are the number associated to them related to the date version?
>
> 3. If so, do we still have to keep all the version to get fully
> functional Genome Browser, or can we just keep the latest version of it?
>
> 4. Do you have any list which gives details about these 115
> databases we have retrieved?
>
>
>
>
>
> I Thank you in advance for your reply,
>
>
>
> Best Regards,
>
>
> Christophe LEGENDRE, PhD
_______________________________________________
Genome maillist - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome