Re: [Genome] Standard data naming and access

Jennifer Jackson Mon, 25 May 2009 11:51:06 -0700

Hello Kenny,

For your items 1-4: Some of this is legacy data - often going back and 
altering previously released data/names causes the most confusion for 
users. For other cases, the data is formatted/compressed in distinct 
ways for specific uses.


For item 5: Your point is clear. The README text describing download 
methods are from a generalized template. It may be helpful/possible to 
customize this a bit more.

Have you seen our documentation regarding the set-up of a mirror? Some 
of the tools there may be helpful for you - they can streamline the 
download of data in batch.

Some help pages with links to additional support/advice for navigating 
data sets and downloading in batch:
http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#Download
http://genome.ucsc.edu/FAQ/FAQlicense#license4
http://genome.ucsc.edu/admin/mirror.html

Thank you for all of your comments. They will be passed along to the 
development team for consideration. Accuracy, consistency, and usability 
have always been important goals for the UCSC Genome Browser Group.

Jennifer Jackson
UCSC Genome Bioinformatics Group

kenny daily (UCI) wrote:
> Are there any standards for genome data available on UCSC GB? Some of the
> issues I see:
>
> 1. Some genomes have a file for all chromosomes called chromFa.tar.gz, while
> some are named after the genome, such as felCat3.tar.gz.
> 2. Some files are in zip format, some are gzipped tar files.
> 3. Some are available through rsync, some are available through wget.
> 4. Some have 2bit versions, or fasta versions, or maybe both.
> 5. Incorrect commands for downloading in README files (For calJac1, an rsync
> command is given as "rsync -avzP rsync://
> hgdownload.cse.ucsc.edu/goldenPath/calJac1/bigZips/chromFa.tar.gz .". But,
> chromFa.tar.gz does not exist, it should be "rsync://
> hgdownload.cse.ucsc.edu/goldenPath/calJac1/bigZips/calJac1.tar.gz .")
> etc...
>
> This makes it very difficult to script updating and downloading large
> amounts of data, without manually curating a list of files to download. Some
> of these things could be taken care of with something as simple as a
> symlink, a build script associated with an alignment and all related
> genomes, etc.
>
> What can be done to help with or alleviate this? Is there any documentation
> for people or labs submitting data to read about how they should send in
> their data?
>
> Thank you,
>
>   
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] Standard data naming and access

Reply via email to