Hi, I tried to index the RefSeq database:
1) I downloaded all ftp://ftp.ncbi.nih.gov/refseq/release/complete/complete*.genomic.gbff.gz file (GB format) 2) gunziped 3) Added the rs_dna entry to my .embossrc file DB rs_dna [ type: "N" method: "emblcd" format: "GB" dir: "/home/users/friard/data/refseq_genomic/" file: "*.gbff" release: "" comment: "RefSeq Genomic (upd)" indexdir: "/home/users/friard/data/refseq_genomic/" ] 4) used dbiflat with following arguments (from the directory where files are stored) dbiflat Index a flat file database Database name: rs_dna EMBL : EMBL SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew GB : Genbank, DDBJ REFSEQ : Refseq Entry format [SWISS]: REFSEQ Database directory [.]: Wildcard database filename [*.dat]: *.gbff Release number [0.0]: Index date [00/00/00]: The indexes were created but when I try to access to a sequence (i.e seqret rs_rna:NC_000004) then results is not the correct sequence but an other one with the NC_000004 ID! I also downloaded the file in FASTA format and tried to index them with the dbifasta command (format: ncbi) without positive results: seqret rs_dna:nc_000004 Reads and writes (returns) sequences Error: Unable to read sequence 'rs_dna:nc_000004' Died: seqret terminated: Bad value for '-sequence' and no prompt Does anyone index the RefSeq successfully? Thank you in advance -- Olivier Friard Laboratorio di Biologia Computazionale Facoltà di Scienze MFN Università di Torino via Accademia Albertina 13, 10124 TORINO (Italy) tel. +39 011 6704689 _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
