Hi Aurélien,

I finally found some time to review your patch, really sorry for the
long delay, but bug fixing passes first when they are sensitive :-/

On Sat, Mar 24, 2018 at 11:16:03PM +0100, Aurélien Nephtali wrote:
> From 53356f83dab6483512d2db1fae87abd24beca4c0 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Aur=C3=A9lien=20Nephtali?= <aurelien.nepht...@corp.ovh.com>
> Date: Thu, 15 Mar 2018 15:44:19 +0100
> Subject: [PATCH] MEDIUM: cli: Add multi-line mode support
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> Add a new command to toggle multi-line mode: multiline.
> Once activated, all commands must end with an empty line.
> 
> The multi-line mode enables line feeds to be used anywhere in the
> request as long it stays syntactically correct once reassembled.
> This mode should be particularly useful for scripts or when inputing
> data that can contain line feeds.
> 
> Examples:
> $ echo -e "multiline;set ssl ocsp-response $(base64 ocsp.der)\n" | socat
> /tmp/sock1 -
> $ echo -e "multiline ; show\nstat\n" | socat /tmp/sock1 -

I must confess I feel a bit embarrassed by this. I think it isn't a good
idea for the long term to mix arguments and payload. And the example you
put below in the doc scares me a little bit :

    > set
    + server TEST/A
    + state
    + ready
    +

It's a bit complicated for me to express why I'm not thrilled, I just
feel it's not right, without having a strong argument. Among the things
I'm thinking about is the fact that the command line arguments are
delimited by spaces, so above, "server" and "TEST/A" are two different
arguments. Then I don't see how we can properly process a *real* payload
containing spaces, like a PEM formated certificate, without starting to
run into dirty hacks.

Also, I'm pretty sure that most, if not all, of the commands consuming a
payload will know how to process it based on the command line arguments.
So making the payload appear as an argument (or mixed with them) will
make it harder to conditionally process them.

Another possibly more problematic point is that if you pass a payload as
an argument, it will be processed at once, thus will always be limited
to a buffer size, so we won't be able to use this to update maps or ACLs
for example.

I agree with the point you made about keeping the parser independant of
the commands so that it doesn't have to know which one takes a payload
and which one not. But here I fear that we'll end up causing the opposite
problem by forcing all commands to run into tricks to distinguish their
args and the payload. For example, let's say we use it to add a map
between HTTP status codes and reasons :

  > add map http_status
  + 200 OK
  + 206 Partial Content
  + 302 Found
  + 303 See Other
  + 304 Not Modified
  ...

If it's cut along words, you can as well enter it this way :

  > add map
  + http_status
  + 200 OK
  + 206 Partial Content
  + 302 Found
  + 303 See Other
  + 304 Not Modified
  ...

Or even this way :

  > add map
  + http_status 200 OK
  + 206 Partial Content
  + 302 Found
  + 303 See Other
  + 304 Not Modified
  ...

Or this way as well :

  > add map
  + http_status 200 OK 206 Partial Content 302 Found 303 See Other 304 Not 
Modified
  ...

You probably see the problem : there's no way to cut the arguments anymore.
By having arguments and payload separate, both can continue to use their
respective, non-colliding syntaxes, so that the command's arguments are
delimited by spaces and only spaces, and are all present once the command
is started, and that the payload's syntax solely depends on the command
and is a byte stream ending on an empty line (ie \n\n). This way commands
consuming multiple entries at once (like maps, acls, crtlists) can easily
keep the on-disk file format and consider that one line equals one entry,
and other commands taking more complex formats (like certificates) can use
a different assumption, including the option to pass the whole stream to
an underlying framework (typically openssl).

Given that there are very few commands consuming a payload, I think we
should not use a sticky mode. The multiline prefix should only apply to
the next command so that there is no need to do this :

  > multiline;add map http_status
  + 200 OK
  + 206 Partial Content
  + 302 Found
  + 303 See Other
  + 304 Not Modified
  +
  > multiline

It could thus be shortened (eg: "ml") or even be a different character,
though I don't really see which one.

Or probably much better, we could end a command with a character indicating
that a payload follows, a bit a-la "<< EOF" we're used to see in shell.
Maybe just using "<<" at the end of a line could be sufficient to indicate
that a payload follows and that it must be ended by an empty line. Eg:

  > add map http_status <<
  + 200 OK
  + 206 Partial Content
  + 302 Found
  + 303 See Other
  + 304 Not Modified
  +
  >

The benefit here is that it's still the parser which knows whether or
not a payload follows, regardless of the command, and can simply pass
a flag to the command saying "you have a payload to read".

While typing this I was also thinking that we could also put this as a
flag in the CLI keywords registration (ie takes a payload or not) but
I'm not convinced this would be a good idea because it would mean that
the user (or script) would have to know whether or not the command
expects a payload, and could quickly get out of sync on the first mistake.

I think that it is reasonable for a first implementation to have some
technical limitations, such as supporting only a single-buffer payload
(ie not implement the streaming mode first). It can be documented that
no more than "tune.bufsize" bytes may be sent at once in a single
command, and scripts feeding large data chunks will simply have to cut
them into pieces (eg: ACLs and maps). This will allow you to make the
parser wait for the empty line when seeing that a multi-line block is
being sent, and simply pass the pointer to this block to the command,
doing everything at once (which is less complex). And this will not
prevent us from removing this limitation later if/once it really
becomes annoying.

Just a very minor cosmetic comments below :

> diff --git a/include/types/applet.h b/include/types/applet.h
> index 89c318c1d..9789e202a 100644
> --- a/include/types/applet.h
> +++ b/include/types/applet.h
> @@ -57,8 +57,11 @@ struct appctx {
>       /* 3 unused bytes here */
>       unsigned short state;      /* Internal appctx state */
>       unsigned int st0;          /* CLI state for stats, session state for 
> peers */
> -     unsigned int st1;          /* prompt for stats, session error for peers 
> */
> +     unsigned int st1;          /* prompt/multiline for stats, session error 
> for peers */
> +#define APPCTX_CLI_ST1_PROMPT    (1 << 0)
> +#define APPCTX_CLI_ST1_MULTILINE (1 << 1)

Please don't put #define in the middle of the structs, they're harder to
find later when reading the code, and it's harder also to make other
defines depend on them. Better place them at the top of the file with the
other ones. Simply mention in st1 that it makes use of APPCTX_CLI_ST1_*
and that's fine.

>       unsigned int st2;          /* output state for stats, unused by peers  
> */
> +     struct chunk *chunk;       /* used to store unfinished commands */
>       struct applet *applet;     /* applet this context refers to */
>       void *owner;               /* pointer to upper layer's entity (eg: 
> stream interface) */
>       struct act_rule *rule;     /* rule associated with the applet. */

Thank you!
Willy

Reply via email to