Preliminary Said() docs

Lars Skovlund Sun, 12 Dec 1999 13:27:00 -0800
Only the first part of Said() is covered, though.

Lars


-- Attached file included as plaintext by Listar --
-- File: said.txt

The black box: The magic behind Sierra's text parser, part 2
By Lars Skovlund

Version 0.1, 4. December 1999. Incomplete!

After parsing the user input, the Said() kernel call is invoked multiple
times to determine what the user intended to do. It takes a pointer to a
so-called said spec (an entry in the said block) as input, and returns
whether it matches the user input or not.

When Parse() was called, a Said event was passed to it. We need that event
record now, to avoid the same user request being handled twice (possibly
with two different results). Unfortunately, such a pointer is not passed to
Said(), so Parse() must store it for us. Anyway, we start by checking that
it is, in fact, a said event (type 0x80), and that it hasn't been handled
(or claimed, as they say) yet.

Since the said spec is a mix of byte- and word-sized values, we start by
extending everything to 16 bits (command codes - the byte values - are
shifted into the upper byte, the word numbers are left alone). This ensures
that we won't have to do costly type casts later on. There are also other
reasons specific to Sierra's SCI implementation which are irrelevant to us.

The Said() process itself is divided into two major parts; generating a tree
for the said spec, and comparing it to the tree generated earlier by Parse().
Thus, the said spec is only used during the first step, and the parse tree
is not used until the second.

By the time we reach the second phase, the said spec commands (0xF0-0xFF)
have been expanded into storage codes in the said tree. These storage codes
are then used to navigate recursively down the parse tree, and some of the
nodes are compared. 

The tree is created with a top that looks like the one from Parse(), i.e.:

                          ----------
                         /          \
                      0x141      (branch) 
                                    /
                                  /
                               0x13F


The following is a reconstruction of the said spec syntax in BNF-like
notation:

Subexpression : Expression
              | Expression < Subexpression
              | Expression [ < Subexpression ]
              |  
              ;

Expression : MainExp
           | MainExp , Expression
           ;

MainExp : [ Subexpression ]
        | ( Subexpression )
        | Wordgroup
        ;

BeforeExp : / Subexpression
          |
          ;

NestedBefore : BeforeExp
             | [ BeforeExp ]
             ;

MoreAfter     : >
              |
              ;         

said_spec : Subexpression NestedBefore MoreAfter
          | Subexpression NestedBefore NestedBefore MoreAfter
          ;

Each "lexeme" (well, most of them) is added to the tree for later use by the
second phase of Said(). This is done by a process called "augmenting" the
tree. The augmentation process joins a sub-expression with the main tree,
adding two descriptive storage codes. The result of an augmentation is
sketched below:

                (parent node in main tree)
                         /      \
                       /         \
             (not assigned)   tree node
                                 /   
                               /      
                          tree node
                        /       \
                      /          \
                  nodeval1    (beginning of subtree)  
                                 /
                                /
                           nodeval2

The right branch of the subtree, of course, contains the subtree information.

Below is a listing of the storage codes used for the augmentation in various
places. The names I use correspond to the yacc representation above:

Lexeme                          nodeval1        nodeval2        comments

said_spec/Subexpression         0x141           0x149
Subexpression/Expression        0x141           0x14F
MainExp/[Subexpression]         0x152           0x14C
MainExp/(Subexpression)         0x141           0x14C
MainExp/Wordgroup               0x141           0x153           ** 
Subexpression/<Subexpression    0x144           0x14F
Subexpression/<Subexpression    0x141           0x144           *
Subexpression/[<Subexpression]  0x152           0x144           
NestedBefore/BeforeExp          0x142           0x14A           first time
NestedBefore/[BeforeExp]        0x152           0x142           first time
NestedBefore/BeforeExp          0x143           0x14A           second time
NestedBefore/[BeforeExp]        0x152           0x143           second time
MoreAfter/>                     0x14B           0xF900

* Subexpression/<Subexpression uses recursion to handle all <'s. The "newest"
  recursion layer augments with the combination 0x144 0x14F and the one above
  it in turn adds that subtree using the combination 0x141 0x144.

The second said phase will be covered in another document, Real Soon Now.
Preliminary Said() docs

Reply via email to