On May 20, 2008, at 5:23 PM, Jeff Saremi wrote:
I'm basically hoping Terence would answer this. And since this is my
first direct question i must say the following:
Terence, when i came across Antlr i was just blown away by what you
had created.
Before that i was about to first write my own simplistic LL1 or SLR.
Then i thought i might as well extend some of the work already done.
So i looked at existing compilers such as Yacc and Cup. Then I
decided to look at antlr. I had heard about it but never examined it.
There it was: antlWorks, EBNF, and LL* parsing (i don't even know
what that means but it sure is a lot more complicated than LL1 and
LL2.)
I'm also surprised that you still actively develop this and monitor
all these forums.
You should just be kicking back and letting us bid for work on this
fine tool because it would be a privilege to have contributed to
this whole thing.
Hell, with that kind of buttering up, I could to nothing less than
respond :)
To start, i'll just copy your comments from within the code in this
post.
Based on what i understand, insertions tokens get priority over the
replacement one. The last replacement overwrites any previous one.
The insertions should later to treated with the oldest (time of
insertion) to the newest.
sounds right.
So far so good. Now we get to the point where we need to insert them
in the stream (toString method).
Here, conceptually, i need to know if we're using a static stream
(the image is frozen before any rewrite) or the image is dynamic.
This is similar to the rowset cursors in databases.
are you asking about concurrent manipulation while you are doing
toString()? that should be disallowed ;)
To illustrate, let's assume "t" stands or an original token. The
stream would have t0 - t5 tokens.
Insertions are represented as "i?" where "?" stands for its index.
such as i2 and i3 are insertions at 2 and 3 respectively.
Replacements (and deletions) are shown as "r?-?" where "??" stands
for start and end of the replacements (index and lastIndex). For
example r2-4 means replace from 2 to 4.
Our rewrite list at index 2 has the following: i2.1, i2.2 which are
to be inserted in the order: 2.1 then 2.2
And we also have one single replacement at the same index 2: r2-4
(replace from 2-4)
So which one would it be?
Good question. either we do the replacement after insertion or we do
the replacement before insertion. It seems to me that we need to track
not only the index of the operation but also the time in which it
occurred relation to other operations. with this make sense? all
operations are sorted by index. Within an index they are sorted by time.
scenario a) static treatment of the stream:
before: t0 t1 t2 t3 t4 t5
after inserts: t0 t1 i2.1 i2.2 t2 t3 t4 t5
after replace: t0 t1 i2.1 i2.2 r2-4 t5
Oh, now I know what you mean by static. I would assume that the way we
want this to behave in general is a buffer of characters. If you do an
insertion and then a replacement, it wipes out the insertion. The
buffer should behave exactly as if it were a dynamic buffer with
operations happening on the fly rather than queued up. I realize now
that perhaps I have, in the interest of efficiency, screwed up this
obvious behavior. I sort by Index purely for efficiency reasons. The
most obvious way to handle operations is: in the order they occur. In
this way there is never a question about what to do when you have an
insert, delete, and a replacement. the order of operations indicates
what to do.
scenario b) dynamic
before: t0 t1 t2 t3 t4 t5
after inserts: t0 t1 i2.1 i2.2 t2 t3 t4 t5
after replace: t0 t1 r2-4 t3 t4 t5
This would be how operations time to by execution time would behave.
Hmm... the only question is one of efficiency now. If the order of
operations is the order in which you invoke in search, replace, or
delete, how do I emit the buffer efficiently? The obvious answer is
simply to go modify the character buffer and then print it out, but
that means all of that memory moving/copying that I'm trying to avoid.
Hmm... the other problem with doing order of operation is that it
alters the token indexes. crap.
Ok, maybe we should take a different approach. lets keep it in the
style we have now, order of operations is index order and reverse
time at that index. If there is something that we don't really know
what to do with like overlapping replacements and insertions, perhaps
I can throw an exception. The general case is that we do insertions in
one place and replacements in a different place. Replacements and
insertions are usually nested operations; they do not overlap. For
example, you are replacing and/or inserting tokens inside an
expression and then you want to rewrite the enclosing expression to
use := instead of = for assignment. the replacement must see all of
the replacements done inside the expression.
0 1 2 3 4 5 6 7
x = 3 * 1 + 0 ;
Here, we replace 3*1 -> 3 and then 3+0 -> 3 then replace x = <expr>;
with x:=<expr> ;. The operations are r2-4, r2-6, r0-7. They are not
overlapping, they are nested. Tell you the truth, I think that
overlapping replacements are probably an error. Replacements occur at
the abstract sub phrase level; I can't imagine a situation where I
want to do replacement from halfway in one sub phrase to halfway
through the next sub phrase. You replace an entire statement not
halfway through one and into the other.
What to think about this Jeff?
Ter
_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org:8080/mailman/listinfo/antlr-dev