Henrik Nordstrom wrote:

> > The attached lex program will parse a URL-encoded string (from stdin)
> > and emit a sequence of shell commands which define environment
> > variables.
> 
> You aren't seriously saying that you feed CGI input to a shell are you?
> 
> This LEX program both contains a parsing bug on correct input (the
> escape syntax is %## not %##%) and is way to easy to fool into embedding
> custom shell commands in the output (the simplest case is ;ls<newline>)

Sorry. The code was a development version of something that I wrote a
while back. I mistook it for the finished version (which does escape
the shell metacharacters correctly, and has the correct escape
syntax).

> A more interesting question (and what I beleive the question was) is how
> to parse CGI input in a C program. A lex progam that emits a unfiltered
> shell variable definitions one character at a time does not help a C
> programmer much..

Well, you can keep the regexps as they are (apart from fixing the %XX%
bug), and replace the OUTCHR and OUTSTR macros. I wasn't suggesting
that the code be used verbatim.

Personally I would say that lex is exactly the right tool to use for
the job. A hand-written state machine in C isn't usually particularly
comprehensible.

> And my answer to this question is: Use one of the available and tested
> libraries for doing this. Unfortunately I do not have a pointer
> available (kind of offline at the moment) but it should not be to hard
> to locate.

Something as simple as this doesn't warrant the use of a dedicated
library, and the associated loss of control over the interface.

> Anyone daring to write his own CGI parsing should know that unless it is
> a protected CGI then you have to take great care in handling all kinds
> of strange input that may or may not be legal according to the encoding
> specifications.

This applies to CGI scripts generally, regardless of whether you
perform your own parsing, e.g. handling '/../' in filenames
constructed from user-supplied data.

> For a C programmer the biggest pit is buffer overflow,

This isn't that hard to avoid. You just need to use a dynamically
allocated buffer, and realloc() it as necessary. With glibc, you can
use open_memstream() and fputc()/fputs().

> for a shell/perl programmer it is input encoded to execute other
> commands.

Yep.

-- 
Glynn Clements <[EMAIL PROTECTED]>

Reply via email to