On Tuesday 05 July 2011, 20:01:26, Tillmann Vogt wrote: > Hi, > > For my font library I need A function that can handle ligatures. It can > be explained best with an example: > > f ["Th", "ff", "fi", "fl", "ffi"] "The fluffiest bunny" > > should be evaluated to > > ["Th", "e", " ", "fl", "u", "ffi", "e", "s", "t", " ", "b", "u", "n", > "n", "y" ] > > I looked at Data.Text > http://hackage.haskell.org/packages/archive/text/0.5/doc/html/Data-Text. > html and > http://hackage.haskell.org/packages/archive/stringsearch/0.3.3/doc/html/ > Data-ByteString-Search.html > > but they don't have a function that can search several substrings in one > run.
Well, Data.ByteString[.Lazy].KarpRabin does provide simultaneous search for multiple substrings - it does not, however, provide direct splitting. But in my tests, unless the number of substrings was large, multiple separate (Boyer-Moore) passes with manual merging of the offset lists were much faster [there's a possible space leak for lazy ByteStrings if any pattern does not appear in a long substring of the source, so that would be a point for Data.ByteString.Lazy.KarpRabin], and I didn't know of any real use-case, so I did not pursue it further and considered it just an interesting curiosity. I suppose using a regex package as Johannes Waldmann suggested would be the easiest way (probably also faster). If you submit a feature request, however, I would look into expanding the offered functionality (but I'll be on vacation soon, so performance would have to wait; I suppose something could be gained there). > I guess that searching a text again and again for every substring is not > very efficient and it can be done in one run. Well, it is surprisingly efficient, compared to (my implementation of) the Karp-Rabin algorithm at least. > Although I may figure this out myself I think such a function could be > so common that someone has done it or can give me some tips. > > Thanks Cheers, Daniel _______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
