bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine

Norihiro Tanaka Sat, 01 Mar 2014 17:24:25 -0800

Hi Paul

Thank you for checking the patch.


> First, why does the first patch add those four using_utf8 calls to
> parse_bracket_exp?  Isn't that optimization valid regardless of
> whether the multibyte encoding is UTF-8?

The optimization which MBCSET is changed into CSET in addtok is completed
on UTF8 locale only, because even if work_mbc->cset is defined in non-UTF8
locales, it's treated as not CSET but MBCSET.  So if not CSET to replacement
to OR, dfa will keep MBCSET until last and return backref.  I want to
avoid it.

However I don't understand why the optimization isn't completed on
non-UTF8 locale only.  Can you explain it?

Norihiro

bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine

Reply via email to