Scott Hazelhurst wrote:
I don't know whether this is a bug or a feature, but I discovered that nthseq skips empty sequences in its counting. So if you have 10 sequences and the fifth is empty, then nthseq -number 6 actually returns the 7th sequence. It does print out a warning that the sequence is empty but not that its skipping (and also if you are putting this in a pipeline you wouldn't see it). I couldn't see any documentation on this. I found this problem in a data set from some collaborators, we ran dust and then used biosed to remove Ns. Obviously this makes some sequences not usable. While it is understandable why nthseq behaves in the way it does, the problem is that in an automated set up it may be difficult do the adjustment.
We will, take a look. Zero length sequences are routinely ignored in EMBOSS. We will check whether it is possible to use an alternative method for counting in nthseq and any other application that counts input sequences.
Of course, if the nth sequence is empty nthseq would have to return a failure to read it.
regards, Peter Rice _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
