Hi,
I've discovered an...unfortunate feature. In the old biolib, I had
'seqlabel' and 'seqheader', the former would...well, let me just give
the definitions:
-- | Return sequence label (first word of header)
seqlabel :: Sequence a -> SeqData
seqlabel (Seq l _ _) = case B.words l of (x:_) -> x; [] -> B.empty
-- | Return full header.
seqheader :: Sequence a -> SeqData
seqheader (Seq l _ _) = l
The current Bio.Core only defines seqlabel, and it returns the full
header. This is unfortunate, since I often generate tables with the
sequence name first, and any spaces or tabs in the header messes up the
columns.
I'm not quite sure how to resolve this, but options are:
1. reintroduce the old behavior by modifying seqlabel, and add seqheader
to the Sequence class:
-- | The 'BioSeq' class models sequence data, and any data object that
-- represents a biological sequence should implement it.
class BioSeq s where
seqlabel :: s -> SeqLabel
+ seqheader :: s -> SeqLabel
seqdata :: s -> SeqData
seqlength :: s -> Offset
2. Keep seqlabel as it is now, and introduce a new function, say seqid:
-- | The 'BioSeq' class models sequence data, and any data object that
-- represents a biological sequence should implement it.
class BioSeq s where
+ seqid :: s -> SeqLabel
seqlabel :: s -> SeqLabel
seqdata :: s -> SeqData
seqlength :: s -> Offset
Note that the actual changes must be implemented in the *users* of this
class, i.e. biofasta, biofastq, biopsl, and whatnot.
Thoughts most welcome.
-k
_______________________________________________
Biohaskell mailing list
[email protected]
http://malde.org/cgi-bin/mailman/listinfo/biohaskell