Hi Aengus, > dbiflat is dying on me with the refseq nucleotides > > I am using > > dbiflat -dbname RefSeqN -idformat refseq -directory . -filenames "*.gbff" > -release 17.0 -date 14/06/06 -fields "acnum,seqvn,des,keyword,taxon" > > and I get > > Warning: Duplicate ID skipped: 'XM_757618' All hits will point to first ID > found > Warning: Duplicate ID skipped: 'XM_757619' All hits will point to first ID > found > Warning: Duplicate ID skipped: 'XM_757620' All hits will point to first ID > found > > EMBOSS An error in embdbi.c at line 1238: > Error in embDbiSortWriteFields, expected entry NM_001004399 not found
Hmmm ... what refseq files are you using (I'm trying now with the ftp://ftp.ncbi.nih.gov/refseq/release/complete/*.gbff.gz files) Could you have a duplicate entry from an old file? Could your sort space have filled up? Running with -noclean will leave the temporary files around and make it easier to check for truncation - though simpoly rerunning would probably give a different result if that is the problem (note to self - must find a way to test the error messages without really filling up a disk) regards, Peter _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
