[Genome] Standard data naming and access

kenny daily (UCI) Mon, 25 May 2009 11:00:08 -0700

Are there any standards for genome data available on UCSC GB? Some of the
issues I see:


1. Some genomes have a file for all chromosomes called chromFa.tar.gz, while
some are named after the genome, such as felCat3.tar.gz.
2. Some files are in zip format, some are gzipped tar files.
3. Some are available through rsync, some are available through wget.
4. Some have 2bit versions, or fasta versions, or maybe both.
5. Incorrect commands for downloading in README files (For calJac1, an rsync
command is given as "rsync -avzP rsync://
hgdownload.cse.ucsc.edu/goldenPath/calJac1/bigZips/chromFa.tar.gz .". But,
chromFa.tar.gz does not exist, it should be "rsync://
hgdownload.cse.ucsc.edu/goldenPath/calJac1/bigZips/calJac1.tar.gz .")
etc...

This makes it very difficult to script updating and downloading large
amounts of data, without manually curating a list of files to download. Some
of these things could be taken care of with something as simple as a
symlink, a build script associated with an alignment and all related
genomes, etc.

What can be done to help with or alleviate this? Is there any documentation
for people or labs submitting data to read about how they should send in
their data?

Thank you,

-- 
Kenny Daily
[email protected]
http://www.kennydaily.net/

--- Prediction is very difficult, especially about the future. (Niels Bohr)
---
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

[Genome] Standard data naming and access

Reply via email to