Only the first part of Said() is covered, though.
Lars
-- Attached file included as plaintext by Listar --
-- File: said.txt
The black box: The magic behind Sierra's text parser, part 2
By Lars Skovlund
Version 0.1, 4. December 1999. Incomplete!
After parsing the user input, the Said() kernel call is invoked multiple
times to determine what the user intended to do. It takes a pointer to a
so-called said spec (an entry in the said block) as input, and returns
whether it matches the user input or not.
When Parse() was called, a Said event was passed to it. We need that event
record now, to avoid the same user request being handled twice (possibly
with two different results). Unfortunately, such a pointer is not passed to
Said(), so Parse() must store it for us. Anyway, we start by checking that
it is, in fact, a said event (type 0x80), and that it hasn't been handled
(or claimed, as they say) yet.
Since the said spec is a mix of byte- and word-sized values, we start by
extending everything to 16 bits (command codes - the byte values - are
shifted into the upper byte, the word numbers are left alone). This ensures
that we won't have to do costly type casts later on. There are also other
reasons specific to Sierra's SCI implementation which are irrelevant to us.
The Said() process itself is divided into two major parts; generating a tree
for the said spec, and comparing it to the tree generated earlier by Parse().
Thus, the said spec is only used during the first step, and the parse tree
is not used until the second.
By the time we reach the second phase, the said spec commands (0xF0-0xFF)
have been expanded into storage codes in the said tree. These storage codes
are then used to navigate recursively down the parse tree, and some of the
nodes are compared.
The tree is created with a top that looks like the one from Parse(), i.e.:
----------
/ \
0x141 (branch)
/
/
0x13F
The following is a reconstruction of the said spec syntax in BNF-like
notation:
Subexpression : Expression
| Expression < Subexpression
| Expression [ < Subexpression ]
|
;
Expression : MainExp
| MainExp , Expression
;
MainExp : [ Subexpression ]
| ( Subexpression )
| Wordgroup
;
BeforeExp : / Subexpression
|
;
NestedBefore : BeforeExp
| [ BeforeExp ]
;
MoreAfter : >
|
;
said_spec : Subexpression NestedBefore MoreAfter
| Subexpression NestedBefore NestedBefore MoreAfter
;
Each "lexeme" (well, most of them) is added to the tree for later use by the
second phase of Said(). This is done by a process called "augmenting" the
tree. The augmentation process joins a sub-expression with the main tree,
adding two descriptive storage codes. The result of an augmentation is
sketched below:
(parent node in main tree)
/ \
/ \
(not assigned) tree node
/
/
tree node
/ \
/ \
nodeval1 (beginning of subtree)
/
/
nodeval2
The right branch of the subtree, of course, contains the subtree information.
Below is a listing of the storage codes used for the augmentation in various
places. The names I use correspond to the yacc representation above:
Lexeme nodeval1 nodeval2 comments
said_spec/Subexpression 0x141 0x149
Subexpression/Expression 0x141 0x14F
MainExp/[Subexpression] 0x152 0x14C
MainExp/(Subexpression) 0x141 0x14C
MainExp/Wordgroup 0x141 0x153 **
Subexpression/<Subexpression 0x144 0x14F
Subexpression/<Subexpression 0x141 0x144 *
Subexpression/[<Subexpression] 0x152 0x144
NestedBefore/BeforeExp 0x142 0x14A first time
NestedBefore/[BeforeExp] 0x152 0x142 first time
NestedBefore/BeforeExp 0x143 0x14A second time
NestedBefore/[BeforeExp] 0x152 0x143 second time
MoreAfter/> 0x14B 0xF900
* Subexpression/<Subexpression uses recursion to handle all <'s. The "newest"
recursion layer augments with the combination 0x144 0x14F and the one above
it in turn adds that subtree using the combination 0x141 0x144.
The second said phase will be covered in another document, Real Soon Now.