Re: REWRITE: New parser design

Thomas Adam Tue, 26 Aug 2014 16:03:15 -0700

On Tue, Aug 26, 2014 at 10:42:03PM +0100, Dominik Vogt wrote:
> 1. A user interface function that takes the F_CMD_ARGS argument
>    list as it does now.  Its tasks are:
> 
>    a) To prepare a structure that contains the syntax tables for
>       the command and any context information that is necessary
>       to parse the commend syntax properly (example: The "last"
>       option of the WindowShade command is not always valid but
>       only under certain circumstances; the parser needs to know
>       about that).
>
>    b) To call the command syntax parser with the input structure
>       prepared in (a).  The parser returns an output structure
>       describing the parse values and an error code.
> 
>    c) In case an error occurs, the user interface function
>       generates an error message (if necessary).


OK, so you have a single entry function which is responsible for
dispatching commands for fvwm to try and process.  I get that.  I'm
thinking now about the parsing mechanics.  There's two approaches to my
mind:

 * We have a big list of commands as we do now, but instead they're
   stored in a structure with callbacks.  Each command has their own
   definition, and one of the callbacks is to prepare() and then
   exec() themselves.  So for example:

        struct single_cmd {
            /* The actual command name */
            const char *name;

            /* What to print on syntax error? */
            const char *usage;

            enum cmd_state (*prepare)(const char *, struct tokens *);
            enum cmd_state (*exec)(F_CMD_ARGS);
        };

 * Then we have a big list of these---let's say to replace the ones in
   mvwm/commands.h --- you can imagine something like:

        extern const struct cmd_single all_cmds[];
        extern const struct cmd_single cmd_break;
        extern const struct cmd_single cmd_close;
        /* Repeat ad nauseum */

        struct cmd_single *all_cmds[] = {
            &cmd_break,
            &cmd_close,
            NULL
        };

* Each CMD_*() as we have it now have one of these somewhere:

        /* Example definition of cmd_break */
        struct single_cmd cmd_break = {
            "Break",
            "Break [levels]",
            cmd_break_prepare,
            cmd_break_exec
        };

And the controlling interface function can just call into each of
->prepare() and ->exec() as it deems fit; keeping track of the error
state, and throwing the command out as it does now.

In terms of free()ing the structure, that should be fairly consistent to
not necessarily need a callback---we could always introduce one and if
it's NULL, don't call it.

I think that summarises what you're referring to, Dominik, and that fits
rather nicely with what I had in mind,  What's nice about this split of
having:

interface function -> prepare() -> [maybe error] -> exec()

Is the interface function can do a lot of the heavy-lifting for us; we
can concentrate on the prepare() side in isolation.

Note that I've hand-waved the actual parsing details.  I'll say a few
things on those now:

 * I'm keen to have tokenised data as an output, perhaps in the form:

        struct tokens {
            const char *key;
            const char *value;

            struct tokens *next;
        };

 The reason being is that if the commands tokenise the data in a
 meaningful way, we can pass a pointer about for common parsing routines
 and have the description for that data remain the same wherever they're
 used (for example in parsing move/resize commands).  This would allow
 us in terms of transitioning to a new parser even easier because the
 data is always in a known format for common areas, reducing the need
 for additional parsing per command.

* Tokenisation in the form key/value means that each command can lookup
  the data in the same way.  For instance:

        if (has_token("level")) {
            int l = strtonum(get_token("level"), 1, INT_MAX, NULL);
            if (l > 10)
                fprintf(stderr, "Sorry, break level too high!\n");
        }

 This says nothing about the type; conversion to other types can happen
 independently as needed (hence the strtonum() definition above).

* Note that the semantics of a command don't change---there might be an
  upfront overhead in terms of structuring the tokenised data, but I
  think this flexibility really does allow for some interesting things.

    - OK, at the moment, that's an internal mechanism in terms of the
      key name, because we're not changing the semantics at this point,
      but when we do, this makes the transition easier.
    - The "key" could then be whatever new identifier per command(s) we
      want, including common idioms for establishing how we define
      move/resize arguments, etc.

Note that the "struct tokens" example doesn't take into account function
parsing semantics of multiple values for different functions, in
specific orderings.

I didn't really want to dive into the mechanics in quite this much
detail, but I'm putting it in these terms upfront now, because this is
how I've thought we might end up parsing things, and that even some form
of this---even if it's not the final thing---can still act as an
intermediary whilst we think about things.

Thanks, Dominik, for this.  I hope I've not misrepresented any of your
points.

Kindly,

-- Thomas Adam

-- 
"Deep in my heart I wish I was wrong.  But deep in my heart I know I am
not." -- Morrissey ("Girl Least Likely To" -- off of Viva Hate.)

Re: REWRITE: New parser design

Reply via email to