Thank you for v5.0 (I have compile-ok on Mandriva Linux 2005 and Ubuntu Linux 6.06 server version.)
Below are my observations from trying install EMBL locally; I did it my own way in the end, so maybe not so useful for this list. I do not need very fast entry access, just a local cache that avoids request-flooding EBI/NCBI for entries and reduces network traffic. First I tried dbxflat. It works fine, but indexing takes time; I estimated a weeks run-time to index the release, and close to a day for the daily updates on the average (the planned incremental indexing will help). This means there has to be a machine dedicated to keep EMBL up to date, because it is cpu-bound. Not unreasonable, but I wanted to have it work on a cheapo external harddrive, say, and for it to be ready sooner. Then I tried BioPerl. It looked like 3 weeks for that run to finish, so not workable. I have not looked how many entries change between releases, but having the new release built soon after its available must be good. Then I tried to put each record in directories derived from their accession number: AACI02000001 would be put in AACI/0200, AX101010 in AX1/010, and so on. Each directory has a two column table (LOOKUP_LIST) with lines like these, AACI02000001.1 1 AACI02000002.1 1 AACI02000003.1 1 AACI02000004.1 2 AACI02000005.1 2 AACI02000006.1 2 AACI02000007.1 2 where column 1 is the versioned accession number and the second is the file name that contains its corresponding entry. The entry files are gzip-compressed and named 1.gz, 2.gz, etc. The release files stay compressed and are deleted after splitting, so the total extra space required does not exceed 20% or so of the distribution size. Creation time is about 26 hours and a typical daily-file is 2-4 minutes; download and import can run in parallel (not done by threads of course, but by launching the script twice). I tried to make a balance between disk and ram by caching file handles etc, but ram does not exceed 330 mb at any time (and stays much lower most of the time). To access a record, I do "zcat $file | seqret ..... " which is then parsed by bioperl. The access time varies between 0.03 seconds to 0.3 seconds depending on size, time since last access, speed of the disk, compression ratio and logic, humidity outside etc. Well, I may again have redone something, but at least it filled my little need. I dont know if others have had the same, or if it is a feature that EMBOSS should have. Niels L PS - one of my mistakes was to get a big slow USB-2 drive under Linux. The drive is ok, but the ext3 file system broke completely. I was advised to use firewire or ATA/SATA instead, which allows health-monitoring with smartctl et al as well (USB does not). _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
