On 16 November 2011 16:08, <[email protected]> wrote: > Yes, the implementation based on UTF-8 vs UTF-16 version of PCRE would > only differ on two lines, the UTF-16 -> UTF-8 and UTF-8 > UTF-16 > conversion before and after the matching. > > I suggest we get started on this with the current version of PCRE, and > hope that entices the PCRE team to work on a proper UTF-16 implementation. > > Anyone interesting in jumping on this task?
I can volunteer some time :) But first: do we all (esp. Thiago, Lars) agree to use the UTF-8 version for now (and pay for the pattern/subject string/offsets conversions) and then write and enable a UTF-16 codepath when PCRE ships with proper support for it (by detecting its version at runtime)? Also: what's the minimum PCRE version Qt should require? I see that Debian 6 (stable) uses 8.02 [1], Ubuntu 10.04 LTS uses 7.8 [2]. For other distributions of course YMMV. Is it OK to depend on even more recent versions? For instance, PCRE 8.10 adds UCP support (basically make \w \d etc. match the corresponding Unicode properties), and PCRE 8.20 adds a JIT feature (which promises large perfomance benefits) [3] [4]. Again: should we resort to depend on a "old" version, detect the proper one at runtime, and optionally enabling those features? About the API itself: would you like more three classes (raw pattern -> compiled pattern -> result of a match), or only two (pattern -> result of a match)? -- Giuseppe D'Angelo [1] http://packages.debian.org/squeeze/libpcre3 [2] http://packages.ubuntu.com/lucid/libpcre3 [3] http://www.pcre.org/changelog.txt [4] http://www.pcre.org/news.txt _______________________________________________ Development mailing list [email protected] http://lists.qt-project.org/mailman/listinfo/development
