Re: LexerInput.read() returns null characters after unicode characters.

venkatram . akkineni Thu, 27 Dec 2018 14:18:15 -0800

First of all, thank you for the detailed answer.

> LexerInput returns a primitive int. It cannot return null.


Yes sir, so I get a integer value zero. EOF as you know would be a -1. I 
believe it is returning a null character '\0'. 

> The editor's lexer plumbing insists that tokens be returned for every
> character in a file. If there is a mismatch, it assumes that is a bug in
> the lexer. The sum of the lengths of all tokens returned while lexing must
> match the number of characters actually in the input. It looks like your
> lexer is trying to bail out without consuming all the characters in the
> file.

Yep, that is what led me to believe there may be more characters than I am 
visually seeing in the file.

> *Your lexer* is returning a null token - signalling EOF - before the actual
> end of the file/document/input.
I did this just to see if that would work when I discovered that Lexer won't 
allow premature termination of lexing. One would have to read till the EOF.

> If you are using ANTLR, does your grammar read the entire file? You need a
> rule that includes EOF explicitly, or it is easy to have a grammar which
> looks like it works most of the time, but for some files will hand you an
> eof token without giving you tokens for the entire file - it does what you
> tell it to, so if you didn't tell it that the content to parse ends only
> when the end of the file is encountered, then it once it has satisfied the
> rules you gave it, it is "done" as far as it is concerned.

This is a hand written lexer. I humbly submit that ANTLR is beyond my 
comprehension. I don't think it is even ANTLR, think it may the prospect of 
having to deal with generated code. That same reason has kept me away from 
various coffeescript, angular and a few others. I caught this issue while 
writing unit tests for the lexer. Seeing that the coverage is at 80% at 
present, I should say I haven't encountered any unpredictable EOFs so far.  
Since I do the integer comparison manually using ==, it is hard to miss EOF 
characters. 

> So, when in that state, read the remaining characters (if any) into a 
> StringBuilder, log them to stdout, see
> what they are and modify your grammar or whatever does the lexing to ensure
> they really get processed.

I will definitely try this. My suspicion is there are some invisible characters 
I am not seeing. May be printing them to console will help.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@netbeans.incubator.apache.org
For additional commands, e-mail: dev-h...@netbeans.incubator.apache.org

For further information about the NetBeans mailing lists, visit:
https://cwiki.apache.org/confluence/display/NETBEANS/Mailing+lists

Re: LexerInput.read() returns null characters after unicode characters.

Reply via email to