Hello Kenny, For your items 1-4: Some of this is legacy data - often going back and altering previously released data/names causes the most confusion for users. For other cases, the data is formatted/compressed in distinct ways for specific uses.
For item 5: Your point is clear. The README text describing download methods are from a generalized template. It may be helpful/possible to customize this a bit more. Have you seen our documentation regarding the set-up of a mirror? Some of the tools there may be helpful for you - they can streamline the download of data in batch. Some help pages with links to additional support/advice for navigating data sets and downloading in batch: http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#Download http://genome.ucsc.edu/FAQ/FAQlicense#license4 http://genome.ucsc.edu/admin/mirror.html Thank you for all of your comments. They will be passed along to the development team for consideration. Accuracy, consistency, and usability have always been important goals for the UCSC Genome Browser Group. Jennifer Jackson UCSC Genome Bioinformatics Group kenny daily (UCI) wrote: > Are there any standards for genome data available on UCSC GB? Some of the > issues I see: > > 1. Some genomes have a file for all chromosomes called chromFa.tar.gz, while > some are named after the genome, such as felCat3.tar.gz. > 2. Some files are in zip format, some are gzipped tar files. > 3. Some are available through rsync, some are available through wget. > 4. Some have 2bit versions, or fasta versions, or maybe both. > 5. Incorrect commands for downloading in README files (For calJac1, an rsync > command is given as "rsync -avzP rsync:// > hgdownload.cse.ucsc.edu/goldenPath/calJac1/bigZips/chromFa.tar.gz .". But, > chromFa.tar.gz does not exist, it should be "rsync:// > hgdownload.cse.ucsc.edu/goldenPath/calJac1/bigZips/calJac1.tar.gz .") > etc... > > This makes it very difficult to script updating and downloading large > amounts of data, without manually curating a list of files to download. Some > of these things could be taken care of with something as simple as a > symlink, a build script associated with an alignment and all related > genomes, etc. > > What can be done to help with or alleviate this? Is there any documentation > for people or labs submitting data to read about how they should send in > their data? > > Thank you, > > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
