On 21 Apr 2006, at 16:00, Olivier Friard wrote:

> The indexes were created but when I try to access to a sequence (i.e
> seqret rs_rna:NC_000004) then results is not the correct sequence but 
> an
> other one with the NC_000004 ID!

Is it just finding the wrong sequence or could you have duplicate 
entries in the data?  Use entret to see if the entry really has that 
ID.

We found that we got problems with incorrect or no sequences being 
returned by seqret when some of the individual sequence files were >2Gb 
in size.  In these cases you can use the new dbx* indexing programs 
which handle large files properly.

> Does anyone index the RefSeq successfully?

Yes.  We use it here without problems, but indexed with dbxflat.

It gets indexed with:

dbxflat -dbresource all -auto -idformat refseq -dbname refseq_all 
-filenames \*.gbff

..and the emboss.default entry looks like:

DB refseq_all
  [
     type: N
     comment: "Refseq"
     method: emboss
     format: genbank
     dbalias: refseq_all
     directory: /data/public/DNA/Refseq/Current/all
     file: *.gbff
  ]

with the resource section being:

RES all [ type: Index
   idlen:  15
   acclen: 15
   svlen:  15
   keylen: 15
   deslen: 15
   orglen: 15
]


Simon.
-- 
Simon Andrews PhD
Bioinformatics Dept.
The Babraham Institute

[EMAIL PROTECTED]
+44 (0) 1223 496463

_______________________________________________
EMBOSS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/emboss

Reply via email to