[Fwd: Re: (SPAM?) space-separated tokens (FAQ?)]
Scott wrote: 1pu2nmmu5cni thumb pick up near forefinger string And doesn't work because no break. Way number two is: lat:(i | o | m ...!rel_move ) {$SFNParse::abbrevs{$item[1]};} which is the lookahead I mentioned previously. Whoops! Now for the two previous lines I get: thumb pick up near forefinger string for both, which is incorrect. Looks like I'll have to adopt method 1. I would still recomend method 2. What I was trying to show in the example fix is not a bullet proof fix that solves all of your problems but to illustrate how lookahead can help resolve some of the ambiguities. It really depends on the exact nature of the ambiguity as to how helpful lookahead can be as your together to get her example illustrates. To correctly parse your example sentence you have to not only tokenize it correctly, correctly interpret the semantics, but you have to also *understand* that the sentence is probably refering to a stream as a sequence of things together than as a flowing body of water that you get her in, using a bunch of glommed things to do it with. As you can tell by now, I *really* don't like depending on white space as a token separator. And yes it does take on somewhat of a religious bent... It takes a bit more effort to figure it out, but one can usually resolve a problem without enforced white space. I don't know all of your grammar, and given your simple test case my guess would be that it is relatively complex. Figuring out exactly how and where to put in the lookahead conditions takes a bit of thought. I made a small change to my way #2: lat: (i|o|(...!rel_move m)){$SFNParse::abbrevs{$item[1]};} Notice what a difference it makes: 1 pu 5f thumb pick up far little finger string 1 pu 5tfo thumb pick up top far outer little finger string 1 mu 2n pu 5cni thumb move under near forefinger string, pick up center near inner little finger string 1 pu 2n mu 5cni # this is wrong thumb pick up near forefinger string, move under center near inner little finger string (this is wrong) 1pu2nmu5cni # this is wrong thumb pick up near forefinger string, move under center near inner little finger string (this is wrong) 1mu2npu5cni # Yucch! But it parses thumb move under near forefinger string, pick up center near inner little finger string (Yucch! But it parses) 1 pu 2nm mu 5cni thumb pick up near middle forefinger string, move under center near inner little finger string 1pu2nmmu5cni thumb pick up near middle forefinger string, move under center near inner little finger string
Re: (SPAM?) space-separated tokens (FAQ?)
Scott wrote: On Tue, Jun 28, 2005 at 09:58:56AM -0700, Ron Smith wrote: Well, here are my results (where test.pl was the file I cut and pasted into the original email): [0 ~/string/spl]$ perl ./test.pl thumb pick up far little finger string thumb pick up top far outer little finger string thumb move under near forefinger string, pick up center near inner little finger string thumb pick up near middle forefinger string thumb move under near forefinger string, pick up center near inner little finger string (Yucch!) Oops, my bad. I didn't copy package SFNParse;. Makes all the difference... Fundamentally you need to decide if white-space is part of your grammar. As is evident from my question, it is. No, see that is the point, it is not evident as there is more than one way to do it, and one of those ways may not really require white space. We WANT white space. This is the way we want to do it. How do we do it? That was the question. NO, we *don't* want the white space, if we can avoid it. Hmmm. Is it 'together' or 'to get her'? Who is she? Who's on first? Touché. Isthatreallyhowyoureadtext?IfsothenIcanreallysaveawholelotofwearandtearonmythumbsbynotbotheringtoeverpressthespacebaronthiskeyboard!Thankyouverymuchforthishelp,Iwilltreasureitalways.Wasthata'spacebar'ora'spacebaron'?Whocares,asthereisnospace.Wewantspacescanyoutellushoworisitjustnotapossibility? Originally, I debated whether or not to respond to this in the above, without any white space. I've proven it to myself. Above run _was_ done in emacs. (Is there any other editor?) Sorry it doesn't seem to work out on your setup No, there is no other editor. At least you belong to the True Religion. If anyone on this list can address the question of how best to attack input as a series of space-separated tokens insteadofasteadystreamofcharacters, please let me know How ironic that I am the only one responding... Thanks, Scott. OK, there is more than one way to do it. Way number one: lat:(i | o | /m\b/) {$SFNParse::abbrevs{$item[1]};} .^.^^^ forces a break, usually with white space given your grammar. Doing this gives: thumb pick up far little finger string thumb pick up top far outer little finger string thumb move under near forefinger string, pick up center near inner little finger string thumb pick up near forefinger string, move under center near inner little finger string (this is now right - maybe?) thumb move under near forefinger string, pick up center near inner little finger string (Yucch! But it parses) Which I think is what you want. (You never *did* say what you were looking *for*. So I'm guessing here. Way number two is: lat:(i | o | m ...!rel_move ) {$SFNParse::abbrevs{$item[1]};} which is the lookahead I mentioned previously. Now note that it *still* works, andevenworksinyournowhitespacecasethatyousodetest: 1 pu 5f thumb pick up far little finger string 1 pu 5tfo thumb pick up top far outer little finger string 1 mu 2n pu 5cni thumb move under near forefinger string, pick up center near inner little finger string 1 pu 2n mu 5cni # this is now right thumb pick up near forefinger string, move under center near inner little finger string (this is wrong) 1pu2nmu5cni # this is the no white space case thumb pick up near forefinger string, move under center near inner little finger string (this is wrong) 1mu2npu5cni # Yucch! But it parses thumb move under near forefinger string, pick up center near inner little finger string (Yucch! But it parses) Noticethatwhitespacehasnothingtodowithitsavingwearandtearonboththespacebaraswellasyourthumb.
Re: (SPAM?) space-separated tokens (FAQ?)
Scott wrote: On Mon, Jun 27, 2005 at 10:19:22AM -0700, Ron Smith wrote: Scott, it is bad form to post code that you have not tested. I copied the above verbatim into an editor and every line in your test data causes an error message. Moreover, there is nothing in your grammar that handles comments. Hmmm. I had tested the code before sending, and it worked fine. I know better than to post untested code. Well, what I did was simple. I cut and pasted your code into an editor, no more, no less. I got this result: Error: 1 pu 5f Error: 1 pu 5tfo Error: 1 mu 2n pu 5cni Error: 1 pu 2n mu 5cni Error: 1mu2npu5cni Second, it seems that what you want to parse is inherently ambiguous because there is no obvious difference between n mu and nm when you discount white space. Right Fundamentally you need to decide if white-space is part of your grammar. As is evident from my question, it is. No, see that is the point, it is not evident as there is more than one way to do it, and one of those ways may not really require white space. notjustawholebunchofstuffglommedtogetherinonestream;guessIwaswrong. It is interesting that the above is not ambiguous! While it may be difficult to read, it is clearly not ambiguous. Starting from left to right, you have: n but this is not a word. Next it could be: no which is a word, so we have a possibility here. But when you accept no as a word, the remainder of the sentence starting with tjust... cannot be completely and totally broken into words. So in the end, a production is forced to accept not as the first word, simply because it is the only way to allow a production to find a second word. And so on. Parsing the above sentence does *not* require white space even though there are specific instances of ambiguity such as no vs. not. Thanks for the crumbs, Scott. If you want more than crumbs, post code that you can prove to yourself can be cut and pasted into an editor such as emacs and run without any modification. It is hard enough to reverse engineer someone's intent in a piece of code when it works, let alone try to figure it out when it doesn't.