Re: (SPAM?) space-separated tokens (FAQ?)

2005-06-30 Thread Scott
On Wed, Jun 29, 2005 at 09:01:19AM -0700, Ron Smith wrote:
 
 NO, we *don't* want the white space, if we can avoid it.

Okay, but why? Or would that be an entire treatise?
 
 Hmmm. Is it 'together' or 'to get her'? Who is she? Who's on first?
 
 Touché.
 
So would a spaceless grammar have to make special provisions for any
and all foreseeable ambiguities such as this?

 
 Isthatreallyhowyoureadtext?IfsothenIcanreallysaveawholelotofwearandtearonmythumbsbynotbotheringtoeverpressthespacebaronthiskeyboard!Thankyouverymuchforthishelp,Iwilltreasureitalways.Wasthata'spacebar'ora'spacebaron'?Whocares,asthereisnospace.Wewantspacescanyoutellushoworisitjustnotapossibility?
 
 Originally, I debated whether or not to respond to this in the above, 
 without any white space.

Glad you changed your mind

 
 
 OK, there is more than one way to do it.
 
 Way number one:
 
 lat:  (i | o | /m\b/) {$SFNParse::abbrevs{$item[1]};}
 .^.^^^
 
 forces a break, usually with white space given your grammar.  Doing 
 this gives:

Okay! So any character in our grammar which conceivably could cause
'overlapping' interpretations can be 'escaped' by enforcing a break.

 
   thumb pick up  far  little finger  string
   thumb pick up  top far outer  little finger  string
   thumb move under near  forefinger  string, pick up  center near inner 
  little finger  string
   thumb pick up  near  forefinger  string, move under center near inner 
  little finger  string  (this is now right - maybe?)
   thumb move under near  forefinger  string, pick up  center near inner 
  little finger  string  (Yucch! But it parses)
 
 
 Which I think is what you want.  (You never *did* say what you were 
 looking *for*.  So I'm guessing here.
 

That is correct. Here is perhaps a better illustration, using the
break method:

1 pu 2nm mu 5cni

thumb pick up  near middle  forefinger  string, move under center near
inner  little finger  string

   Correct.

1pu2nmmu5cni

thumb pick up  near  forefinger  string  

  And doesn't work because no break.


 Way number two is:
 
 lat:  (i | o | m ...!rel_move ) {$SFNParse::abbrevs{$item[1]};}
 
 which is the lookahead I mentioned previously.
 

Whoops! Now for the two previous lines I get:

  thumb pick up  near  forefinger  string   

for both, which is incorrect. Looks like I'll have to adopt method 1.

Hey, thank you for the help. Sorry we hit it off on the wrong foot
initially (or wrong finger, I guess... Actually, the grammar used to
have a 'foot' in it as a bodypart - so you could say 'tr F O'. Except
it would probably come out 'transfer foots to mouth' due to some
semi-kludgy plural-handling code I expunged from the sample...).

Scott


[Fwd: Re: (SPAM?) space-separated tokens (FAQ?)]

2005-06-30 Thread Ron Smith

Scott wrote:



1pu2nmmu5cni

thumb pick up  near  forefinger  string  


  And doesn't work because no break.




Way number two is:

lat:(i | o | m ...!rel_move ) {$SFNParse::abbrevs{$item[1]};}

which is the lookahead I mentioned previously.




Whoops! Now for the two previous lines I get:

  thumb pick up  near  forefinger  string   


for both, which is incorrect. Looks like I'll have to adopt method 1.



I would still recomend method 2.  What I was trying to show in the
example fix is not a bullet proof fix that solves all of your problems
but to illustrate how lookahead can help resolve some of the
ambiguities.  It really depends on the exact nature of the ambiguity as
to how helpful lookahead can be as your together to get her example
illustrates.  To correctly parse your example sentence you have to not
only tokenize it correctly, correctly interpret the semantics, but you
have to also *understand* that the sentence is probably refering to a
stream as a sequence of things together than as a flowing body of
water that you get her in, using a bunch of glommed things to do it with.

As you can tell by now, I *really* don't like depending on white space
as a token separator.  And yes it does take on somewhat of a religious
bent... It takes a bit more effort to figure it out, but one can usually
resolve a problem without enforced white space.

I don't know all of your grammar, and given your simple test case my
guess would be that it is relatively complex.  Figuring out exactly how
and where to put in the lookahead conditions takes a bit of thought.

I made a small change to my way #2:

lat: (i|o|(...!rel_move  m)){$SFNParse::abbrevs{$item[1]};}

Notice what a difference it makes:

1 pu 5f
  thumb pick up  far  little finger  string
1 pu 5tfo
  thumb pick up  top far outer  little finger  string
1 mu 2n pu 5cni
  thumb move under near  forefinger  string, pick up  center near inner
 little finger  string
1 pu 2n mu 5cni # this is wrong
  thumb pick up  near  forefinger  string, move under center near inner
 little finger  string  (this is wrong)
1pu2nmu5cni # this is wrong
  thumb pick up  near  forefinger  string, move under center near inner
 little finger  string  (this is wrong)
1mu2npu5cni # Yucch! But it parses
  thumb move under near  forefinger  string, pick up  center near inner
 little finger  string  (Yucch! But it parses)
1 pu 2nm mu 5cni
  thumb pick up  near middle  forefinger  string, move under center
near inner  little finger  string
1pu2nmmu5cni
  thumb pick up  near middle  forefinger  string, move under center
near inner  little finger  string



Re: (SPAM?) space-separated tokens (FAQ?)

2005-06-29 Thread Ron Smith

Scott wrote:

On Tue, Jun 28, 2005 at 09:58:56AM -0700, Ron Smith wrote:

Well, here are my results (where test.pl was the file I cut and pasted
into the original email):

[0 ~/string/spl]$ perl ./test.pl
  thumb pick up  far  little finger  string  
  thumb pick up  top far outer  little finger  string  
  thumb move under near  forefinger  string, pick up  center near inner  little finger  string  
  thumb pick up  near middle  forefinger  string  
  thumb move under near  forefinger  string, pick up  center near inner  little finger  string  (Yucch!)


Oops, my bad.  I didn't copy package SFNParse;.  Makes all the 
difference...




Fundamentally you need to decide if white-space is part of your
grammar. 



As is evident from my question, it is.


No, see that is the point, it is not evident as there is more than one 
way to do it, and one of those ways may not really require white

space.



We WANT white space. This is the way we want to do it. How do we do
it? That was the question.


NO, we *don't* want the white space, if we can avoid it.



Hmmm. Is it 'together' or 'to get her'? Who is she? Who's on first?


Touché.



Isthatreallyhowyoureadtext?IfsothenIcanreallysaveawholelotofwearandtearonmythumbsbynotbotheringtoeverpressthespacebaronthiskeyboard!Thankyouverymuchforthishelp,Iwilltreasureitalways.Wasthata'spacebar'ora'spacebaron'?Whocares,asthereisnospace.Wewantspacescanyoutellushoworisitjustnotapossibility?


Originally, I debated whether or not to respond to this in the above, 
without any white space.



I've proven it to myself. Above run _was_ done in emacs. (Is there any
other editor?) Sorry it doesn't seem to work out on your setup


No, there is no other editor.  At least you belong to the True Religion.



If anyone on this list can address the question of how best to attack
input as a series of space-separated tokens
insteadofasteadystreamofcharacters, please let me know 


How ironic that I am the only one responding...



Thanks,
Scott.




OK, there is more than one way to do it.

Way number one:

lat:(i | o | /m\b/) {$SFNParse::abbrevs{$item[1]};}
.^.^^^

forces a break, usually with white space given your grammar.  Doing 
this gives:


  thumb pick up  far  little finger  string
  thumb pick up  top far outer  little finger  string
  thumb move under near  forefinger  string, pick up  center near inner 
 little finger  string
  thumb pick up  near  forefinger  string, move under center near inner 
 little finger  string  (this is now right - maybe?)
  thumb move under near  forefinger  string, pick up  center near inner 
 little finger  string  (Yucch! But it parses)



Which I think is what you want.  (You never *did* say what you were 
looking *for*.  So I'm guessing here.


Way number two is:

lat:(i | o | m ...!rel_move ) {$SFNParse::abbrevs{$item[1]};}

which is the lookahead I mentioned previously.

Now note that it *still* works, 
andevenworksinyournowhitespacecasethatyousodetest:


1 pu 5f
  thumb pick up  far  little finger  string

1 pu 5tfo
  thumb pick up  top far outer  little finger  string

1 mu 2n pu 5cni
  thumb move under near  forefinger  string, pick up  center near inner 
 little finger  string


1 pu 2n mu 5cni # this is now right
  thumb pick up  near  forefinger  string, move under center near inner 
 little finger  string  (this is wrong)


1pu2nmu5cni # this is the no white space case
  thumb pick up  near  forefinger  string, move under center near inner 
 little finger  string  (this is wrong)


1mu2npu5cni # Yucch! But it parses
  thumb move under near  forefinger  string, pick up  center near inner 
 little finger  string  (Yucch! But it parses)



Noticethatwhitespacehasnothingtodowithitsavingwearandtearonboththespacebaraswellasyourthumb.


Re: (SPAM?) space-separated tokens (FAQ?)

2005-06-28 Thread Ron Smith

Scott wrote:

On Mon, Jun 27, 2005 at 10:19:22AM -0700, Ron Smith wrote:

Scott, it is bad form to post code that you have not tested.  I copied 
the above verbatim into an editor and every line in your test data 
causes an error message.  Moreover, there is nothing in your grammar 
that handles comments.



Hmmm. I had tested the code before sending, and it worked fine. I know
better than to post untested code.


Well, what I did was simple.  I cut and pasted your code into an editor, 
no more, no less.


I got this result:

Error: 1 pu 5f
Error: 1 pu 5tfo
Error: 1 mu 2n pu 5cni
Error: 1 pu 2n mu 5cni
Error: 1mu2npu5cni




Second, it seems that what you want to parse is inherently ambiguous 
because there is no obvious difference between n mu and nm when you 
discount white space.



Right



Fundamentally you need to decide if white-space is part of your
grammar. 



As is evident from my question, it is.


No, see that is the point, it is not evident as there is more than one 
way to do it, and one of those ways may not really require white space.



notjustawholebunchofstuffglommedtogetherinonestream;guessIwaswrong.


It is interesting that the above is not ambiguous!  While it may be 
difficult to read, it is clearly not ambiguous.


Starting from left to right, you have:
n
but this is not a word.
Next it could be:
no
which is a word, so we have a possibility here.  But when you accept 
no as a word, the remainder of the sentence starting with tjust... 
cannot be completely and totally broken into words.  So in the end, a 
production is forced to accept not as the first word, simply because 
it is the only way to allow a production to find a second word.  And so 
on.  Parsing the above sentence does *not* require white space even 
though there are specific instances of ambiguity such as no vs. not.




Thanks for the crumbs,
Scott.



If you want more than crumbs, post code that you can prove to yourself 
can be cut and pasted into an editor such as emacs and run without any 
modification.  It is hard enough to reverse engineer someone's intent in 
a piece of code when it works, let alone try to figure it out when it 
doesn't.


Re: (SPAM?) space-separated tokens (FAQ?)

2005-06-27 Thread Scott
On Mon, Jun 27, 2005 at 10:19:22AM -0700, Ron Smith wrote:
 
 Scott, it is bad form to post code that you have not tested.  I copied 
 the above verbatim into an editor and every line in your test data 
 causes an error message.  Moreover, there is nothing in your grammar 
 that handles comments.

Hmmm. I had tested the code before sending, and it worked fine. I know
better than to post untested code. 

 
 Second if you really want help, telling the potential helper to go ex 
 R3 Up themselves is also not particularly helpful. Ex R3 up to you 
 too... Ex R3 io for the horse you rode in on.

cute

 
 But assuming that you meant this somewhat in jest and that your mother 
 didn't raise you to have enough sense to be gracious, I'll provide a few 
 bread crumbs.

Whoa. What I said was that, if my post was in bad form,
you were free to tell _me_ 'ex R3 up'. I was trying to be gracious in
apologising in advance for intruding into the world of experts. Viz:

I've tried to distill the grammar down for this post, but again, I
apologize if it is too long to digest. If you don't like it, you can
 ^^^
say, in our full grammar: ex R3 up.

Please read before ranting. And you can leave my mother out of this,
thanks.

 Second, it seems that what you want to parse is inherently ambiguous 
 because there is no obvious difference between n mu and nm when you 
 discount white space.

Right

 
 Fundamentally you need to decide if white-space is part of your
 grammar. 

As is evident from my question, it is.

 If so, then what will help to disambiguate this case is a look-ahead 
 that comprehends the white-space.  If white space is not truly required, 
 than what you need is still a look-ahead, but something that parses 
 ahead to see if the other case is what *will* happen.

I guess I was asking for the 'look-ahead that comprehends
white-space'. I always thought a grammar was handed 'tokens',
notjustawholebunchofstuffglommedtogetherinonestream;guessIwaswrong.

 
 However, in general, designing ambiguous grammars is something to
 avoid.

Fruit flies like a banana.

Thanks for the crumbs,
Scott.