On Tue, Aug 30, 2005 at 01:31:22PM +0200, Joel Reymont wrote: > Can I beg for examples?
This is from some old code, slightly polished for presentation - the code for parsing DNS domain name label in DNS packets: parseLabel :: CharParser st Label parseLabel = (<?> "label") $ do len <- byte guard (len <= 63) s <- count (fromIntegral len) anyChar return $! stringToLabel s Today I would rather process Word8 lists: type ByteParser a = GenParser Word8 st a parseLabel :: ByteParser st Label Here is a parser for the whole DNS message: parseMessage :: CharParser st Domain -> CharParser st Message parseMessage pDomain = do msgid <- parseMsgID header <- parseMsgHeader qdcount <- fmap fromIntegral beWord16 ancount <- fmap fromIntegral beWord16 nscount <- fmap fromIntegral beWord16 arcount <- fmap fromIntegral beWord16 questions <- count qdcount (parseQuestion pDomain) answers <- count ancount (parseRR pDomain) auth <- count nscount (parseRR pDomain) additional <- count arcount (parseRR pDomain) return (Message { msgID = msgid, msgHeader = header, msgQuestions = questions, msgAnswers = answers, msgAuth = auth, msgAdditional = additional }) The pDomain parameter is for dealing with DNS domain suffix compression - parsing a domain name may require jumping to an earlier part of the message. Today I would either use a MonadReader to hide this parameter, or a different parser monad with random access. In another application for reading some binary files I defined a BinaryParser monad, with one implementation using Parsec and another using unboxed arrays. IIRC, the implementation using UArrays was about 30-60 times faster than the one using parsec, probably because Parsec uses lists. Surprisingly, the biggest speed boost was caused (again IIRC) by writing a specialised "times" implementation for the UArray version. class (Functor m, Monad m) => BinaryParser m where byte :: m Word8 bytes :: Int -> m (UArray Int Word8) bytes n = do l <- count n byte return $! (listArray (0, n-1) l) int8 :: m Int8 int16 :: m Int16 int32 :: m Int32 int64 :: m Int64 word16 :: m Word16 word32 :: m Word32 word64 :: m Word64 word16 = fmap fromIntegral int16 word32 = fmap fromIntegral int32 word64 = fmap fromIntegral int64 asciiz :: m (UArray Int Word8) asciiz = do s <- decodeStr [] return $! (listArray (0, length s - 1) s) where decodeStr acc = do b <- byte if b == 0 then return (reverse acc) else decodeStr (b : acc) eof :: m () atEof :: m Bool times :: Int -> m a -> m [a] times = count Again, I would do it a bit differently today. For example, this interface says nothing about endianness. Recently I've used a different interface for a different protocol. This is a state monad where the state is a slice of the buffer: newtype BufferReader a instance Monad BufferReader instance MonadZero BufferReader byteAt :: Int -> BufferReader Word8 -- changes the state for subsequent computation skip :: Int -> BufferReader () -- runs the given computation in a slice slicing :: Int -> Int -> BufferReader a -> BufferReader a slicing start len br = ... many :: BufferReader a -> BufferReader [a] runBufferReader :: WithBuffer b => b -> BufferReader a -> IO (Either String a) darcs' FastPackedString module, which was recently put into a separate library by Don Stewart (http://www.cse.unsw.edu.au/~dons/code/fps), could be nice for parsing binary messages, because: - it is (supposed to be) fast and memory efficient - supports fast (O(1)) random access and slices (tailPS, initPS, dropPS, takePS) with purely functional interface - is based on bytes But I am slightly worried about the possibility of space leaks, when a small slice holds the entire message in memory. Random thoughts: - I am often using Template Haskell to automate the generation of parsers and unparsers (it helps tremendously when you have many data types with many fields to parse/unparse, and even more if the protocol changes often) - there are some libraries for dealing with serialisation in Haskell, for example : http://www.cs.helsinki.fi/u/ekarttun/SerTH/ - there is an attempt to write an operating system in Haskell: http://www.cse.ogi.edu/~hallgren/House/ you can check how it handles IP4/UDP/TCP headers Best regards Tomasz _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe