On Wed, Nov 30, 2011 at 11:38 AM, Peter Rice <[email protected]> wrote: > On 11/30/2011 11:32 AM, Pjotr Prins wrote: > >> Git is not very good for storing large data files, which we would want >> to fetch partially. My suggestion would be to have a plain old file >> repo, e.g. on S3, which can be mirrored by others. > > We had issues with large files in the EMBOSS release, and make those > available via rsync to add to the developers CVS checkout. They include the > NCBI taxonomy source and index files and the ontology source and index > files. > > The next EMBOSS release will include http and ftp URLs as valid inputs for > any data type, so EMBOSS could use remote files for format tests. I' look > into how other repositories could be added. > > I had to add some extra qualifiers to allow queries and offsets to be > specified, and rewrote the query language parsing to merge very similar code > segments. > > regards, > > Peter Rice > EMBOSS Team
How about an OBF hosted FTP site then if we want big data? I guess we'd mostly be adding files, and changes/deletions should be rare, so a full version tracking repository isn't essential if we are disciplined about updating README files or more formal meta data. Peter _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
