I've been looking into building parsers at runtime (from a config
file), and in my case it's beneficial to fit them into the context of
a larger parser with Attoparsec.Text. This code is untested for
practical use so I doubt you'll see comparable performance to the
aforementioned regex packages, but it could be worth exploring if you
need to mix and match parsers or if the definitions can change
arbitrarily at runtime.

import qualified Data.Text as T
import Data.Attoparsec.Text
import Control.Applicative ((<|>))

parseLigature x = string (T.pack x)

charToText = do c <- anyChar
                return (T.singleton c)

buildChain [x]    = parseLigature x
buildChain (x:xs) = try (parseLigature x) <|> buildChain xs

-- ordering matters here, so "ffi" comes before "ff" or "fi"
ligatures = buildChain ["ffi", "th", "ff", "fi", "fl"]

myParser = many (try ligatures <|> charToText)

-- at ghci prompt: parseOnly myParser (T.pack "the fluffiest bunny")
-- Right ["th","e"," ","fl","u","ffi","e","s","t"," ","b","u","n","n","y"]




On Tue, Jul 5, 2011 at 12:09 PM, Bryan O'Sullivan <[email protected]> wrote:
> On Tue, Jul 5, 2011 at 11:01 AM, Tillmann Vogt
> <[email protected]> wrote:
>>
>> I looked at Data.Text
>> http://hackage.haskell.org/packages/archive/text/0.5/doc/html/Data-Text.html
>> and
>> http://hackage.haskell.org/packages/archive/stringsearch/0.3.3/doc/html/Data-ByteString-Search.html
>>
>> but they don't have a function that can search several substrings in one
>> run.
>
> Here's what you want:
> http://hackage.haskell.org/packages/archive/text-icu/0.6.3.4/doc/html/Data-Text-ICU-Regex.html
> _______________________________________________
> Haskell-Cafe mailing list
> [email protected]
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>

_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to