Re: [Libreoffice] [Crazy Ideas] Discuss "Replace regexp parser with std library"

Thorsten Behrens Mon, 29 Nov 2010 05:23:15 -0800

Joe Smith wrote:
> I've looked at the code a bit, and it seems like there is indeed only one 
> point
> of contact with the rest of the suite, textsearch.cxx, which handles all types
> of text searches (normal, regexp & fuzzy), and calls Regexpr::re_search(), 
> which
> calls re_match2() to run the actual regexp match.
> 
> So the structure makes it easy to replace the regexp code in one place.
> 
> Unfortunately, the way the functions work does not match well with the Boost 
> RE
> classes, although I'm sure it would be possible with an interface layer.
> 
> For example, the Boost engine handles locale-specific issues internally, 
> whereas
> OOo's engine knows almost nothing about character case or multi-character
> sequences. Instead, it preps the text to be searched by running it through a
> filter. I don't understand the i18n & character encoding issues well enough to
> guess what that filter is actually doing or how it should be handled.
> 
Hi Joe,


hm - then I think a combination of those two approaches might be a
winning strategy - LibO uses icu for all those nifty transliteration
stuff & what not.

I notice that newer boost versions also optionally support icu,
maybe that already gives us good enough coverage - I'd be tempted to
just give it a whirl, and add it as an optional, experimental
feature to have people play with it.

Cheers,

-- Thorsten

pgp8DTxCj9okj.pgp
Description: PGP signature

_______________________________________________
LibreOffice mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/libreoffice

Re: [Libreoffice] [Crazy Ideas] Discuss "Replace regexp parser with std library"

Reply via email to