Hi Tim,

On Thu, Nov 04, 2021 at 07:12:04PM +0100, Tim Düsterhus wrote:
> Your patch is already merged and the bug is fixed. However I'd like to
> comment on the reasons behind why I refactored the whole function to use the
> ist API:
> 
> I *strongly* dislike code that just works because of some implicit
> assumptions that might or might not hold after future changes. This is C,
> every messed up boundary check can easily result in some major issues.
> 
> In this specific case the implicit assumption is that all supported
> algorithms are exactly 5 characters long. This is true with the current
> implementation in HAProxy which just supports the HS, RS, ES, and PS
> families. However the JOSE specification [1] also specifies other values for
> the 'alg' field (though most of them for encrypted instead of just signed
> tokens) and they have differing lengths.
> 
> If someone in the future wants to add support for a 6 character algorithm,
> then they need to remember to add the length check to not run into the same
> strncmp() issue like you did. My experience is that if a mistake is made
> once, it is going to be made again.
> 
> The ist API already contains a large number of functions for *safe* handling
> of strings that are not NUL terminated. It is very hard to misuse them,
> because they all include the appropriate checks. If you see isteq(dynamic,
> ist("static")) then it is very clear that this will match the string
> "statis" exactly. For strncmp you will need to perform the length check
> yourself. You also need to remember to manually subtract 1 for the trailing
> NUL when using sizeof, whereas this is done automatically by the ist()
> function.
> 
> While I am not an expert in optimization and getting the last percent of
> performance out of a function, I don't think the ist API is going to be
> measurably slower than strncmp. It is used everywhere within HTX after all!
> For the JWT the OpenSSL crypto is going to take up the majority of the time
> anyway.
> 
> Hope this help to clarify why I'm strongly pushing the use of the ist API
> and why I've contributed a fair number of additional functions in the past
> when I encountered situations where the current functions were not ergonomic
> to use. One example is the istsplit() function which I added for
> uri_normalizer.c. It makes tokenization of strings very easy and safe.

I overall generally agree with your arguments above as I created ist exactly
to address such trouble that always end up with someone spend 3 days on a
but in someone else's code just to discover a logic mistake such as a
string function called on a string that's not always nul-terminated, or
reverse scanning that can occasionally continue before the beginning of
the string etc.

However, there is an important aspect to keep in mind, which is that Rémi
is the maintainer of this part and he needs to use what he feels at ease
with. I can easily understand that he's not much familiar with the usage
of ist. While the functions are correctly documented and generally useful,
their coverage is not always complete (and you had a few times to add new
ones yourself), and we don't have a simple index of what's available with
a one-liner description, which doesn't help to get familiar with them.
In addition there are places where the input text is still made of nul
terminated strings, that require an extra conversion that can also be
error-prone, or just would add some code whose benefit is not immediately
perceptible.

I, too, hope that over time we make most of the code that deals with
strings coming from user data evolve towards a more generalized use of
ist for the reasons you mention. Safer, easier, friendlier API when
you're in a code area that already makes use of them. I will just not
force anyone to do that, though I can encourage by showing how to use
it. However you can be sure that I'll complain very loudly if I have
to debug something that was caused by unsafe code that could have been
avoided this way.

The performance aspect is indeed something that ought not be neglected
in two domains:
  - CPU: the functions are made to be inlined, and as their end is
    known, usually this results in trivial operations that can be
    much faster than the usual ones. However the functions are designed
    to deal optimally with short strings (like keywords, cookies etc)
    and to result in optimal code for small loops (i.e. no function
    call and the smallest possible setup code). Using them on large
    blocks will result in lower performance than regular functions.
    Looking up a string inside a body for example ought not be done
    with them ;

  - size: the strings use two parts and they are thus larger than a
    single pointer. In some structures the space could be extremely
    tight and the cost of storing the length can sometimes cancel some
    of the gains. But this is quite rare as almost all of our strings
    are accompanied with a length nowadays.

My general rule of thumb on this is:
  - if you're dealing with small strings (around 200 chars or less)
    that you manipulate very often and for which storing the size is
    not a problem, better use ist.

  - if you're manipulating strings a lot in a function (tokenizing, etc),
    it's often worth turning them to ist before performing the operations
    as the pointer and length will then just become registers that are
    passed and updated between calls. In addition many times the compiler
    can detect constants and optimize loops.

  - otherwise it's probably not worth it.

There are some cases where a function like this one:

  int is_blah_str()
  {
        return strcmp(get_blah_str(), "blah") == 0;
  }

can be more efficiently done like this:

  int is_blah_ist()
  {
        return isteq(get_blah_ist(), ist("blah"));
  }

and provide more optimal code like this:

0000000000000080 <is_blah_ist>:
  80:   53                              push   %rbx
  81:   31 db                           xor    %ebx,%ebx
  83:   31 c0                           xor    %eax,%eax
  85:   e8 00 00 00 00                  callq  8a <is_blah_ist+0xa>  // 
get_blah_ist
  8a:   48 83 fa 04                     cmp    $0x4,%rdx
  8e:   75 0b                           jne    9b <is_blah_ist+0x1b>
  90:   31 db                           xor    %ebx,%ebx
  92:   81 38 62 6c 61 68               cmpl   $0x68616c62,(%rax)
  98:   0f 94 c3                        sete   %bl
  9b:   89 d8                           mov    %ebx,%eax
  9d:   5b                              pop    %rbx
  9e:   c3                              retq   

where the constant was turned into a single 32-bit integer comparison.
This is more efficient both in terms of CPU and code size. But in order
for this to happen more often, we sometimes have to provide hints to
the compiler, and the code really needs to be improved (this will happen
over time).

By the way I think that I should create an "api" sub-directory under
"doc/internals" to place the docs about the internal API parts, such
as buffers, htx, filters, initcalls, and the missing ist. This could
probably save developers some time and encourage others to add new
stuff there from time to time.

Regards,
Willy

Reply via email to