Hi.

Looks like it works as designed because currently are used the "str*cmp" 
functions for matching. Your solution with hex convert looks like how the '\0' 
byte issue could be fixed.

http://git.haproxy.org/?p=haproxy-2.0.git;a=blob;f=src/pattern.c;hb=6d9a455da17251c34d3c552f2a963447f52fdd80#l724

HAProxy can trie to use memcmp but the "sub" match will then work different.

https://stackoverflow.com/questions/13095513/what-is-the-difference-between-memcmp-strcmp-and-strncmp-in-c/13095574#13095574

>From binary point of view is '\0<tag>' not the same as '<tag>'?

Maybe the mentioned 
https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm in 
the code will fix this behavior but as far as I known no one works on such a 
patch.

It would be nice when you send us a patch to fix the doc.

Regards
Aleks

Nov 30, 2019 11:35:24 AM Mathias Weiersmüller (cyberheads GmbH) 
<mathias.weiersmuel...@cyberheads.ch>:

> (CCing Thierry Fournier as maintainer of the pattern matching part)
> 
> 
> > We use HAProxy in TCP Mode for non-HTTP protocols.
> > 
> > The request of one particular protocol looks like this:
> > 
> > - length of message (binary value, 4 bytes long)
> 
> > - binary part (40-200 bytes)
> > - XML part
> > 
> > Goal: We want to use a particular backend when the XML part of the request
> > contains the string "<tag>".
> > 
> > We used this ACL:
> > acl tag_found req.payload(0,0) -m sub <tag>
> > 
> > The problem:
> > The substring matching stops on a Null byte (\0) in a binary fetch. We 
> > always have this case (the request normally starts with Null
> > bytes). Therefore, the match never succeeds. As there might be null bytes 
> > in the binary part too, we cannot just start the payload
> > fetch
> > after byte 4.
> > 
> > ==========================
> > frontend fe_test
> > bind *:3000
> > 
> > tcp-request inspect-delay 5s
> > 
> > acl content_present req_len gt 0
> > acl tag_found req.payload(0,0) -m sub <tag>
> > 
> > tcp-request content accept if content_present
> > tcp-request content reject
> > 
> > # depending on if the payload contains the string "<tag>", we use different 
> > backends
> > # right now, the two backends are exactly the same.
> > use_backend be_tag if tag_found
> > default_backend be_default
> > 
> > backend be_tag
> > server srv_1:4000
> > 
> > backend be_default
> > server srv_1:4000
> > 
> > Test cases:
> > (tested on versions 2.0.10, 1.5.18)
> > echo -e '<tag>' | nc 127.0.0.1 3000 # will use backend be_tag
> > echo -e '\0<tag>' | nc 127.0.0.1 3000 # will use backend be_default, but 
> > should use be_tag
> > ==========================
> > 
> > Workaround:
> > =>convert payload into hexified string, parse against hex:
> > acl tag_found req.payload(0,0),hex -m sub 3C7461673E # this is <tag> in 
> > hexadecimal
> > 
> > Dear list members, these are the questions I am twisting my mind with. Do 
> > you have a good take one these?
> > 
> > - Is there another (better) way to do a substring match on a payload which 
> > contains Null bytes?
> > - Would another, new match method make sense here (something like sub_bin ? 
> > )
> > - Do we run into a problem with the hex conversion because the size of the 
> > sample has double the size than the original (maybe
> > bigger than bufsize?)
> > 
> > 
> 
> If this behavior is intended, then the configuration manual (7.1.3 Matching 
> strings) should be updated to reflect this:
> 
> Do not use string matches for binary fetches which might contain null bytes 
> (0x00),
> as the comparison stops at the occurrence of the first null byte. Instead, 
> convert
> the binary fetch to a hex string with the hex converter first.
> 
> Example:
> acl tag_found req.payload(0,0),hex -m sub 3C7461673E # this is <tag> in 
> hexadecimal
> 
> Does that make sense?
> 
> Best regards
> 
> Mathias
> 


Reply via email to