On May 20, 2008, at 5:23 PM, Jeff Saremi wrote:

I'm basically hoping Terence would answer this. And since this is my first direct question i must say the following: Terence, when i came across Antlr i was just blown away by what you had created. Before that i was about to first write my own simplistic LL1 or SLR. Then i thought i might as well extend some of the work already done. So i looked at existing compilers such as Yacc and Cup. Then I decided to look at antlr. I had heard about it but never examined it. There it was: antlWorks, EBNF, and LL* parsing (i don't even know what that means but it sure is a lot more complicated than LL1 and LL2.) I'm also surprised that you still actively develop this and monitor all these forums. You should just be kicking back and letting us bid for work on this fine tool because it would be a privilege to have contributed to this whole thing.

Hell, with that kind of buttering up, I could to nothing less than respond :)

To start, i'll just copy your comments from within the code in this post. Based on what i understand, insertions tokens get priority over the replacement one. The last replacement overwrites any previous one. The insertions should later to treated with the oldest (time of insertion) to the newest.

sounds right.

So far so good. Now we get to the point where we need to insert them in the stream (toString method). Here, conceptually, i need to know if we're using a static stream (the image is frozen before any rewrite) or the image is dynamic. This is similar to the rowset cursors in databases.

are you asking about concurrent manipulation while you are doing toString()? that should be disallowed ;)

To illustrate, let's assume "t" stands or an original token. The stream would have t0 - t5 tokens. Insertions are represented as "i?" where "?" stands for its index. such as i2 and i3 are insertions at 2 and 3 respectively. Replacements (and deletions) are shown as "r?-?" where "??" stands for start and end of the replacements (index and lastIndex). For example r2-4 means replace from 2 to 4.

Our rewrite list at index 2 has the following: i2.1, i2.2 which are to be inserted in the order: 2.1 then 2.2 And we also have one single replacement at the same index 2: r2-4 (replace from 2-4)

So which one would it be?

Good question. either we do the replacement after insertion or we do the replacement before insertion. It seems to me that we need to track not only the index of the operation but also the time in which it occurred relation to other operations. with this make sense? all operations are sorted by index. Within an index they are sorted by time.

scenario a) static treatment of the stream:
before: t0 t1 t2 t3 t4 t5
after inserts: t0 t1 i2.1 i2.2 t2 t3 t4 t5
after replace: t0 t1 i2.1 i2.2 r2-4 t5

Oh, now I know what you mean by static. I would assume that the way we want this to behave in general is a buffer of characters. If you do an insertion and then a replacement, it wipes out the insertion. The buffer should behave exactly as if it were a dynamic buffer with operations happening on the fly rather than queued up. I realize now that perhaps I have, in the interest of efficiency, screwed up this obvious behavior. I sort by Index purely for efficiency reasons. The most obvious way to handle operations is: in the order they occur. In this way there is never a question about what to do when you have an insert, delete, and a replacement. the order of operations indicates what to do.

scenario b) dynamic
before: t0 t1 t2 t3 t4 t5
after inserts: t0 t1 i2.1 i2.2 t2 t3 t4 t5
after replace: t0 t1 r2-4 t3 t4 t5

This would be how operations time to by execution time would behave.

Hmm... the only question is one of efficiency now. If the order of operations is the order in which you invoke in search, replace, or delete, how do I emit the buffer efficiently? The obvious answer is simply to go modify the character buffer and then print it out, but that means all of that memory moving/copying that I'm trying to avoid.

Hmm... the other problem with doing order of operation is that it alters the token indexes. crap.

Ok, maybe we should take a different approach. lets keep it in the style we have now, order of operations is index order and reverse time at that index. If there is something that we don't really know what to do with like overlapping replacements and insertions, perhaps I can throw an exception. The general case is that we do insertions in one place and replacements in a different place. Replacements and insertions are usually nested operations; they do not overlap. For example, you are replacing and/or inserting tokens inside an expression and then you want to rewrite the enclosing expression to use := instead of = for assignment. the replacement must see all of the replacements done inside the expression.

0 1 2 3 4 5 6 7
x = 3 * 1 + 0 ;

Here, we replace 3*1 -> 3 and then 3+0 -> 3 then replace x = <expr>; with x:=<expr> ;. The operations are r2-4, r2-6, r0-7. They are not overlapping, they are nested. Tell you the truth, I think that overlapping replacements are probably an error. Replacements occur at the abstract sub phrase level; I can't imagine a situation where I want to do replacement from halfway in one sub phrase to halfway through the next sub phrase. You replace an entire statement not halfway through one and into the other.

What to think about this Jeff?

Ter
_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org:8080/mailman/listinfo/antlr-dev

Reply via email to