Re: [Development] Regular expression libraries for QRegExp

Giuseppe D'Angelo Mon, 21 Nov 2011 07:45:56 -0800

On 16 November 2011 16:08,  <[email protected]> wrote:
> Yes, the implementation based on UTF-8 vs UTF-16 version of PCRE would
> only differ on two lines, the UTF-16 -> UTF-8 and UTF-8 > UTF-16
> conversion before and after the matching.
>
> I suggest we get started on this with the current version of PCRE, and
> hope that entices the PCRE team to work on a proper UTF-16 implementation.
>
> Anyone interesting in jumping on this task?


I can volunteer some time :)

But first: do we all (esp. Thiago, Lars) agree to use the UTF-8
version for now (and pay for the pattern/subject string/offsets
conversions) and then write and enable a UTF-16 codepath when PCRE
ships with proper support for it (by detecting its version at
runtime)?

Also: what's the minimum PCRE version Qt should require? I see that
Debian 6 (stable) uses 8.02 [1], Ubuntu 10.04 LTS uses 7.8 [2]. For
other distributions of course YMMV. Is it OK to depend on even more
recent versions? For instance, PCRE 8.10 adds UCP support (basically
make \w \d etc. match the corresponding Unicode properties), and PCRE
8.20 adds a JIT feature (which promises large perfomance benefits) [3]
[4].
Again: should we resort to depend on a "old" version, detect the
proper one at runtime, and optionally enabling those features?

About the API itself: would you like more three classes (raw pattern
-> compiled pattern -> result of a match), or only two (pattern ->
result of a match)?
-- 
Giuseppe D'Angelo

[1] http://packages.debian.org/squeeze/libpcre3
[2] http://packages.ubuntu.com/lucid/libpcre3
[3] http://www.pcre.org/changelog.txt
[4] http://www.pcre.org/news.txt
_______________________________________________
Development mailing list
[email protected]
http://lists.qt-project.org/mailman/listinfo/development

Re: [Development] Regular expression libraries for QRegExp

Reply via email to