Are there any standards for genome data available on UCSC GB? Some of the issues I see:
1. Some genomes have a file for all chromosomes called chromFa.tar.gz, while some are named after the genome, such as felCat3.tar.gz. 2. Some files are in zip format, some are gzipped tar files. 3. Some are available through rsync, some are available through wget. 4. Some have 2bit versions, or fasta versions, or maybe both. 5. Incorrect commands for downloading in README files (For calJac1, an rsync command is given as "rsync -avzP rsync:// hgdownload.cse.ucsc.edu/goldenPath/calJac1/bigZips/chromFa.tar.gz .". But, chromFa.tar.gz does not exist, it should be "rsync:// hgdownload.cse.ucsc.edu/goldenPath/calJac1/bigZips/calJac1.tar.gz .") etc... This makes it very difficult to script updating and downloading large amounts of data, without manually curating a list of files to download. Some of these things could be taken care of with something as simple as a symlink, a build script associated with an alignment and all related genomes, etc. What can be done to help with or alleviate this? Is there any documentation for people or labs submitting data to read about how they should send in their data? Thank you, -- Kenny Daily [email protected] http://www.kennydaily.net/ --- Prediction is very difficult, especially about the future. (Niels Bohr) --- _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
