Re: (SPAM?) space-separated tokens (FAQ?)
On Wed, Jun 29, 2005 at 09:01:19AM -0700, Ron Smith wrote: NO, we *don't* want the white space, if we can avoid it. Okay, but why? Or would that be an entire treatise? Hmmm. Is it 'together' or 'to get her'? Who is she? Who's on first? Touché. So would a spaceless grammar have to make special provisions for any and all foreseeable ambiguities such as this? Isthatreallyhowyoureadtext?IfsothenIcanreallysaveawholelotofwearandtearonmythumbsbynotbotheringtoeverpressthespacebaronthiskeyboard!Thankyouverymuchforthishelp,Iwilltreasureitalways.Wasthata'spacebar'ora'spacebaron'?Whocares,asthereisnospace.Wewantspacescanyoutellushoworisitjustnotapossibility? Originally, I debated whether or not to respond to this in the above, without any white space. Glad you changed your mind OK, there is more than one way to do it. Way number one: lat: (i | o | /m\b/) {$SFNParse::abbrevs{$item[1]};} .^.^^^ forces a break, usually with white space given your grammar. Doing this gives: Okay! So any character in our grammar which conceivably could cause 'overlapping' interpretations can be 'escaped' by enforcing a break. thumb pick up far little finger string thumb pick up top far outer little finger string thumb move under near forefinger string, pick up center near inner little finger string thumb pick up near forefinger string, move under center near inner little finger string (this is now right - maybe?) thumb move under near forefinger string, pick up center near inner little finger string (Yucch! But it parses) Which I think is what you want. (You never *did* say what you were looking *for*. So I'm guessing here. That is correct. Here is perhaps a better illustration, using the break method: 1 pu 2nm mu 5cni thumb pick up near middle forefinger string, move under center near inner little finger string Correct. 1pu2nmmu5cni thumb pick up near forefinger string And doesn't work because no break. Way number two is: lat: (i | o | m ...!rel_move ) {$SFNParse::abbrevs{$item[1]};} which is the lookahead I mentioned previously. Whoops! Now for the two previous lines I get: thumb pick up near forefinger string for both, which is incorrect. Looks like I'll have to adopt method 1. Hey, thank you for the help. Sorry we hit it off on the wrong foot initially (or wrong finger, I guess... Actually, the grammar used to have a 'foot' in it as a bodypart - so you could say 'tr F O'. Except it would probably come out 'transfer foots to mouth' due to some semi-kludgy plural-handling code I expunged from the sample...). Scott
[Fwd: Re: (SPAM?) space-separated tokens (FAQ?)]
Scott wrote: 1pu2nmmu5cni thumb pick up near forefinger string And doesn't work because no break. Way number two is: lat:(i | o | m ...!rel_move ) {$SFNParse::abbrevs{$item[1]};} which is the lookahead I mentioned previously. Whoops! Now for the two previous lines I get: thumb pick up near forefinger string for both, which is incorrect. Looks like I'll have to adopt method 1. I would still recomend method 2. What I was trying to show in the example fix is not a bullet proof fix that solves all of your problems but to illustrate how lookahead can help resolve some of the ambiguities. It really depends on the exact nature of the ambiguity as to how helpful lookahead can be as your together to get her example illustrates. To correctly parse your example sentence you have to not only tokenize it correctly, correctly interpret the semantics, but you have to also *understand* that the sentence is probably refering to a stream as a sequence of things together than as a flowing body of water that you get her in, using a bunch of glommed things to do it with. As you can tell by now, I *really* don't like depending on white space as a token separator. And yes it does take on somewhat of a religious bent... It takes a bit more effort to figure it out, but one can usually resolve a problem without enforced white space. I don't know all of your grammar, and given your simple test case my guess would be that it is relatively complex. Figuring out exactly how and where to put in the lookahead conditions takes a bit of thought. I made a small change to my way #2: lat: (i|o|(...!rel_move m)){$SFNParse::abbrevs{$item[1]};} Notice what a difference it makes: 1 pu 5f thumb pick up far little finger string 1 pu 5tfo thumb pick up top far outer little finger string 1 mu 2n pu 5cni thumb move under near forefinger string, pick up center near inner little finger string 1 pu 2n mu 5cni # this is wrong thumb pick up near forefinger string, move under center near inner little finger string (this is wrong) 1pu2nmu5cni # this is wrong thumb pick up near forefinger string, move under center near inner little finger string (this is wrong) 1mu2npu5cni # Yucch! But it parses thumb move under near forefinger string, pick up center near inner little finger string (Yucch! But it parses) 1 pu 2nm mu 5cni thumb pick up near middle forefinger string, move under center near inner little finger string 1pu2nmmu5cni thumb pick up near middle forefinger string, move under center near inner little finger string