Re: Another patch, for discussion tho

Bruce Korb Sat, 21 Apr 2012 10:14:45 -0700


So after futzing with timing a bit, I figured out the following:


1. These pre-computed tables _can_ out perform "strpbrk".  But
   only if the skipped over character count is approximately in
   the range of a dozen or two.  After that, single instruction
   testing and hand crafted assembly code beat it out. (These
   generated tables require a load, mask and test instead of
   just a load and test.)

2. The setup for a single character strpbrk break-on string is
   *MUCH* larger than the setup cost for a two-or-more character
   string.  Likely, someone is trying to optimize the setup and
   the setup is efficient enough that this optimization pessimizes.

3. It was never about efficiency of execution anyway.  It is quite
   unlikely that time-critical code is going to be scanning over
   strings anyway.  If they must, then use strpbrk/strcspn.
   Maybe for really critical scanning code, variants of those
   could split the interface into setup_strpbrk and run_strpbrk.

   I suppose, in retrospect, I could do the same thing and
   achieve the same efficiency.  "SETUP_whatever_SCAN()"
   populates an array of bytes that merely need to be tested
   for "true" and "false" instead of masking.  Entirely doable,
   but not today.

This whole thing _is_ about efficiency -- but efficiency of
expression, and also flexibility.  (Change the characters
in a classification and the main code now accepts the new
character set without alteration.  E.g. add '$' to the set
of name characters for "C" and now you are VMS compatible.)

So where would the right place be for a beast like this?

Re: Another patch, for discussion tho

Reply via email to