Re: [antlr-dev] TokenRewriteStream 2nd question: targeted for Terence

Terence Parr Mon, 02 Jun 2008 15:14:27 -0700

Jeff, thanks so much for jumping into this.  You've identified a  
fundamental flaw in my rewrite engine.  I worked with Leon Su and  
Shaoting Cai today, two of my grad students, and we've figure out how  
to guarantee correctness and efficiency. I'll try to impl. right  
now...might be too much for this week...
Ter
On May 28, 2008, at 5:56 PM, Jeff Saremi wrote:


>
> --- Terence Parr <[EMAIL PROTECTED]> wrote:
>
>>
>> Ok, maybe we should take a different approach. lets
>> keep it in the
>> style we have now,  order of operations is index
>> order and reverse
>> time at that index. If there is something that we
>> don't really know
>> what to do with like overlapping replacements and
>> insertions, perhaps
>> I can throw an exception. The general case is that
>> we do insertions in
>> one place and replacements in a different place.
>> Replacements and
>> insertions are usually nested operations; they do
>> not overlap. For
>> example, you are replacing and/or inserting tokens
>> inside an
>> expression and then you want to rewrite the
>> enclosing expression to
>> use := instead of = for assignment. the replacement
>> must see all of
>> the replacements done inside the expression.
>>
>> 0 1 2 3 4 5 6 7
>> x = 3 * 1 + 0 ;
>>
>> Here, we replace 3*1 -> 3 and then 3+0 -> 3 then
>> replace x = <expr>;
>> with x:=<expr> ;. The operations are r2-4, r2-6,
>> r0-7. They are not
>> overlapping, they are nested. Tell you the truth, I
>> think that
>> overlapping replacements are probably an error.
>> Replacements occur at
>> the abstract sub phrase level; I can't imagine a
>> situation where I
>> want to do replacement from halfway in one sub
>> phrase to halfway
>> through the next sub phrase. You replace an entire
>> statement not
>> halfway through one and into the other.
>>
>
> You really put things in perspective for me.
> As you said, the order in time is important and also
> what's important is the nested concept behind the
> operations.
>
> I'd like to put this differently. And I'll show that
> by referring to the "foobar" example. I also would
> like to ask us to forget about the performance for now
> until we come and revisit it. The reason is we want to
> get to the rationalization and the original logic and
> not the way the code is implemented.
>
> Given the token stream "abc" the reason
> insert(2,'foo') and insert(2, 'bar') result in
> "abbarfooc" is that after  executing insert(2,'foo)
> the stream is changed. The "new" token that index 2 is
> referring to is now "b" of "bar" and not the original
> "c". That means our stream got first transformed to
> "abfooc" and then after the insertion of 'bar' at the
> new index 2 we ended up with "abbarfooc".
>
> This is consistent with what Terence said about r2-4,
> r2-6 and r0-7 above.
>
> Having accepted that every operation works on the
> result of the previous operation we can move on with
> the following example:
>
> Original Tokens: "abcde"
> Now we get a whole bunch Rewrite ops that were added
> independent of one another during (potentially)
> different rule interpretations:
> (i=insert, r=replace, T=Time)
> T0: i(2,'m')
> T1: i(2,'n')
> T3: r(0,'o',2)
> T4: i(0, 'p')
>
> Here's what the newly revisited logic would produce:
>                0 1 2 3 4 5 6
>                a b c d e
> T0: i(2,'m')    a b m c d e
> T1: i(2,'n')    a b n m c d e
> T3: r(0,'o',2)  o m c d e
> T4: i(0, 'p')   p o m c d e
> Result: "pomcde"
>
> But the existing code would probably produce the
> following (look at how the ops are re-ordered based on
> the index):
>                0 1 2 3 4 5 6
>                a b c d e
>  : i(0, 'p')   p a b c d e
>  : r(0,'o',2)  p o d e
>  : i(2,'n')    // no effect - index jumped to 3
>  : i(2,'m')    // no effect - index jumped to 3
> Result: "pode"
>
> The point is, there should be no grouping of
> operations based on the index (again leave the
> specific implementations out). All operations are
> valid. A Replace at some index does not overwrite the
> previous Replace ops at the same index. An insert
> coming later but with a lower index is not output
> ahead of another insert which was placed earlier but
> with a higher index.
>
> Now how do we do this? My first instinct was to
> re-index the ops as they arrive. A little risky and
> confusing, if doable at all.
>
> All we want to do is avoid expanding/shrinking the
> token array. That's a costly operation.
>
> However, another thought is to always keep the tokens
> in a Linked List as opposed to an Array. That way we
> could easily insert new items at a cost of O(1)?
> Replaces can also be done easily by linking the newly
> inserted item to the item falling after the
> "lastIndex" of that op.
>
> If this is sound, token rewrite could easily be done
> by first maintaining a Stack of operations (new ones
> are pushed onto the top) and then doing a loop over or
> a recursive call. Here's a simplistic recursive
> implementation:
> (the method op.exec() takes a Linked List, executes
> itself into that list and then returns the updated
> list)
>
> newlyRewrittenTokenList = rewriteTokens(ops, tokens) {
>    op = ops.pop();
>    if(ops.empty()) return op.exec(tokens);
>    return op.exec(rewriteTokens(ops, tokens));
> }
>
> Waiting for some feedback.
> jeff
>
> _______________________________________________
> antlr-dev mailing list
> [email protected]
> http://www.antlr.org:8080/mailman/listinfo/antlr-dev

_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org:8080/mailman/listinfo/antlr-dev

Re: [antlr-dev] TokenRewriteStream 2nd question: targeted for Terence

Reply via email to