Hi Paul

Thank you for checking the patch.

> First, why does the first patch add those four using_utf8 calls to
> parse_bracket_exp?  Isn't that optimization valid regardless of
> whether the multibyte encoding is UTF-8?

The optimization which MBCSET is changed into CSET in addtok is completed
on UTF8 locale only, because even if work_mbc->cset is defined in non-UTF8
locales, it's treated as not CSET but MBCSET.  So if not CSET to replacement
to OR, dfa will keep MBCSET until last and return backref.  I want to
avoid it.

However I don't understand why the optimization isn't completed on
non-UTF8 locale only.  Can you explain it?

Norihiro




Reply via email to