On 30 Nov 2006, at 13:57, Jean Mao wrote: > Hi, > > Does any program in EMBOSS package can make use of the Reference > Sequence > Databases? I indexed refseq databases with dbxflat and run showfeat > against > them but receive error about has zero length sequence : > > Warning: Sequence 'refseqnt-id:NG_002612' has zero length, ignored > Error: > Unable to read sequence 'refseqnt:NG_002612'
NG_ sequences in refseq are a bit odd. They're not real sequences but a virtual collection of other sequences which are joined together to make longer assemblies. The records themselves don't actually contain any sequence (hence the zero length sequence error), just pointers to parts of other sequences. On the NCBI website they have a facility to join the fragments together to create a 'real' sequence from them. You could probably do this if you had all the underlying sequences available, but it's not something which is likely to be possible during indexing. EMBOSS works fine with normal refseq files, but these virtual files are not something I'd say it was reasonable for it to cope with. It would be nice if NCBI offered an option to download rendered versions of these sequences, but as many of them are pretty big it might be a very large data set. TTFN Simon. _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
