On Wed, Jun 21, 2017 at 1:08 AM, William A Rowe Jr <[email protected]> wrote:
>
> But there were only two questionable values for \0, and in this case the
> answer is obvious. Invert the rule to a TOKEN char from the rather dubious
> TOKEN_STOP definition. Solved.
... for trunk, IMO. I don't suggest we touch this code again on
2.4 branch. For trunk, when I encounter the concept of a STOP
I envision there is a set of defined delimiters (e.g. punctuation/ctrls)
which serve as terminators. What HTTP defined was the set of
token characters in the bnf. T_HTTP_NOT_TOKEN would have
been a better name for the existing rule. T_HTTP_TOKEN is
simply a better understood rule.
On to whether element [NUL] tests aught to be trustable or
unused; My research (unsure if this was shared earlier) starts
with tests of isfn('\0');
isalnum false
isalpha false
iscntrl true
isdigit false
isgraph false
islower false
isprint false
ispunct false
isspace false
isupper false
isxdigit false
isascii true
isblank false
In each case above, the isfn('\0') results are plainly obvious, right
down to iscntrl(NUL)'s and isascii(NUL)'s result. Our resulting
test_char table values include element [NUL] as well, and it needs
to be sensible for the many purposes...
Our declarations;
T_ESCAPE_SHELL_CMD false (correct, \0 is EOS, not a char)
T_ESCAPE_PATH_SEGMENT false (correct, \0 is EOS, not a char)
T_OS_ESCAPE_PATH false (correct, \0 is EOS, not a char)
T_HTTP_TOKEN_STOP true (correct, \0 is a non-token byte)
T_ESCAPE_LOGITEM false (correct, \0 is EOS not a char)
T_ESCAPE_FORENSIC true (interesting/debatable)
T_ESCAPE_URLENCODED false (correct, \0 is EOS not a char)
T_HTTP_CTRLS true (correct, \0 is NUL, non VCHAR)
T_VCHAR_OBSTEXT false (correct, \0 is NUL < 0x80)
In the _ESCAPE's we are escaping a string, and NUL's terminate
a string and are not escaped. Any NUL byte traveling the wire
needs to be evaluated by the appropriate TOKEN, CTRLS or
OBSTEXT rule by HTTP RFC.
The T_ESCAPE_FORENSIC case inverts the general rule for
command lines and paths we are constructing, but as a symbol
over the wire, I can agree with the idea of logging an escaped \0
in the case of a NUL input byte. This differs from logging in that
our logging API has no counted-string semantic (if it did, I guess
we would encrypt \0's.)