Re: [antlr-dev] TokenRewriteStream 2nd question: targeted for Terence

Jeff Saremi Wed, 28 May 2008 17:56:50 -0700

--- Terence Parr <[EMAIL PROTECTED]> wrote:

> 
> Ok, maybe we should take a different approach. lets
> keep it in the  
> style we have now,  order of operations is index
> order and reverse  
> time at that index. If there is something that we
> don't really know  
> what to do with like overlapping replacements and
> insertions, perhaps  
> I can throw an exception. The general case is that
> we do insertions in  
> one place and replacements in a different place.
> Replacements and  
> insertions are usually nested operations; they do
> not overlap. For  
> example, you are replacing and/or inserting tokens
> inside an  
> expression and then you want to rewrite the
> enclosing expression to  
> use := instead of = for assignment. the replacement
> must see all of  
> the replacements done inside the expression.
> 
> 0 1 2 3 4 5 6 7
> x = 3 * 1 + 0 ;
> 
> Here, we replace 3*1 -> 3 and then 3+0 -> 3 then
> replace x = <expr>;  
> with x:=<expr> ;. The operations are r2-4, r2-6,
> r0-7. They are not  
> overlapping, they are nested. Tell you the truth, I
> think that  
> overlapping replacements are probably an error.
> Replacements occur at  
> the abstract sub phrase level; I can't imagine a
> situation where I  
> want to do replacement from halfway in one sub
> phrase to halfway  
> through the next sub phrase. You replace an entire
> statement not  
> halfway through one and into the other.
>


You really put things in perspective for me.
As you said, the order in time is important and also
what's important is the nested concept behind the
operations.

I'd like to put this differently. And I'll show that
by referring to the "foobar" example. I also would
like to ask us to forget about the performance for now
until we come and revisit it. The reason is we want to
get to the rationalization and the original logic and
not the way the code is implemented.

Given the token stream "abc" the reason
insert(2,'foo') and insert(2, 'bar') result in
"abbarfooc" is that after  executing insert(2,'foo)
the stream is changed. The "new" token that index 2 is
referring to is now "b" of "bar" and not the original
"c". That means our stream got first transformed to
"abfooc" and then after the insertion of 'bar' at the
new index 2 we ended up with "abbarfooc".

This is consistent with what Terence said about r2-4,
r2-6 and r0-7 above.

Having accepted that every operation works on the
result of the previous operation we can move on with
the following example:

Original Tokens: "abcde"
Now we get a whole bunch Rewrite ops that were added
independent of one another during (potentially)
different rule interpretations:
(i=insert, r=replace, T=Time)
T0: i(2,'m')
T1: i(2,'n')
T3: r(0,'o',2)
T4: i(0, 'p')

Here's what the newly revisited logic would produce:
                0 1 2 3 4 5 6
                a b c d e 
T0: i(2,'m')    a b m c d e
T1: i(2,'n')    a b n m c d e
T3: r(0,'o',2)  o m c d e
T4: i(0, 'p')   p o m c d e
Result: "pomcde"

But the existing code would probably produce the
following (look at how the ops are re-ordered based on
the index):
                0 1 2 3 4 5 6
                a b c d e 
  : i(0, 'p')   p a b c d e
  : r(0,'o',2)  p o d e
  : i(2,'n')    // no effect - index jumped to 3
  : i(2,'m')    // no effect - index jumped to 3
Result: "pode"

The point is, there should be no grouping of
operations based on the index (again leave the
specific implementations out). All operations are
valid. A Replace at some index does not overwrite the
previous Replace ops at the same index. An insert
coming later but with a lower index is not output
ahead of another insert which was placed earlier but
with a higher index.

Now how do we do this? My first instinct was to
re-index the ops as they arrive. A little risky and
confusing, if doable at all.

All we want to do is avoid expanding/shrinking the
token array. That's a costly operation.

However, another thought is to always keep the tokens
in a Linked List as opposed to an Array. That way we
could easily insert new items at a cost of O(1)?
Replaces can also be done easily by linking the newly
inserted item to the item falling after the
"lastIndex" of that op.

If this is sound, token rewrite could easily be done
by first maintaining a Stack of operations (new ones
are pushed onto the top) and then doing a loop over or
a recursive call. Here's a simplistic recursive
implementation:
(the method op.exec() takes a Linked List, executes
itself into that list and then returns the updated
list)

newlyRewrittenTokenList = rewriteTokens(ops, tokens) {
    op = ops.pop();
    if(ops.empty()) return op.exec(tokens);
    return op.exec(rewriteTokens(ops, tokens));
} 

Waiting for some feedback.
jeff

_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org:8080/mailman/listinfo/antlr-dev

Re: [antlr-dev] TokenRewriteStream 2nd question: targeted for Terence

Reply via email to