On Mon, Apr 28, 2014 at 8:40 AM, Jonathan Wakely <jwak...@redhat.com> wrote:
> I've been looking through the regex code and have a few ideas for
> simplifications or optimisations that I'd like to share.

Thanks :)

> This first patch is for _BracketMatcher. We only use std::bitset when
> is_same<_CharT, char> so 8 * sizeof(_CharT) should be __CHAR_BIT__
> instead. We also only user _UnsignedCharT when is_same<_CharT, char>
> so that can just be simplified to unsigned char.

Yes, since _UnsignedCharT is just used as indexes, we can always use a
larger unsigned integer instead. Maybe "size_t" is a better choice?

I'm not sure if we'll have a wchar_t cache (bitset<65536>) in the future ;)

> The contents of _BracketMatcher::_M_char_set are not sorted and can
> contain duplicates in the current code. Making that a sorted, unique
> list in _BracketMatcher::_M_ready() allows a binary search instead of
> linear search. This improves worst case performance for pathological
> regular expressions like std::wregex('['+std::wstring(1000, 'a')+"b]")
> but I'm not sure if it helps in the common case.

Trust me, in common case, even it's a bit slower, it won't be
observable. So it's just OK.

> Finally, in the non-char case the _CacheT member is an unused empty
> object, so having that as the first member requires 7 bytes of
> padding. Re-ordering the members reduces the size of a non-char
> _BracketMatcher by 8 bytes (but it's still a whopping 96 bytes).
>
> (For a char _BracketMatcher the bitset cache makes it 128 bytes,
> this patch doesn't change that).

This is OK too.

Rant: I can't understand why the committee doesn't keep things simple
by letting struct { } has 0 size. I used to implement my own tuple by
specializing tuple<> as an empty struct. The thing is, one tuple
shouldn't `has it` as a member, or it will get extra padding; at last
I used inheritance.


-- 
Regards,
Tim Shen

Reply via email to