"Aaron M. Renn" <[EMAIL PROTECTED]> writes:
> What if the spec seems to be broken?
Goto http://www.nodule.demon.co.uk/java/ and file a report. I agree
the JLS definition of StreamTokenizer is a little underspecified, but
I don't think it's broken.
> It lists various strategies for parsing, but doesn't say what order
> they should be applied in (other than white space is skipped first)
> except that I guess we could assume that we should apply the
> criteria in order (which looks wrong) or what should be done if
> conflict is encountered.
The criteria should be applied in order.
> First, there are five character attributes. The spec says a
> character can have more than one attribute, but says nothing about
> the precedence they the various attributes have.
BTW, the spec doesn't say this, but each attribute modifier can be
called multiple times, for instance:
wordChars(32, 45); wordChars(48,52);
... would set chars 32 through 45 and 48 through 52 to have the
"alphabetic" attribute.
There's no need for precedence of attributes, if you apply the
criteria in order.
> For example, words are terminated by the first non-alpha/non-numeric
> char encountered. This (from the example) seems to imply that a
> whitespace char terminated the word. But what if the whitespace
> char is also an alpha char?
Any non-alpha/non-numeric char would terminate a word, so if a
whitespace char is an alpha char, then it would not terminate that
word.
> Even if a given character can only have one attribute, the spec
> still seems broke from what I can tell.
Just do the parsing, step by step (ie. paragraph by paragraph of the
JLS), and things should go ok.
> For example, a word (alpha) token is terminated by the first
> non-alpha token. But if C++ comments are enabled, then "//" might
> not terminate the word, but instead would be included as part of the
> word. Ditto for "/*" (C style comments). That probably isn't what
> the user expected.
I fail to see the problem here. "/" is a non-alphabetic/non-numeric
character, so it will terminate the word. You don't throw away the
"/", but instead parse that character on the next loop.
For example:
hey//this is a comment<EOL>
this results in the follow behavior:
1st pass: hey = identifier
2nd pass: //this is a comment<EOL> = comment
--
Paul Fisher * [EMAIL PROTECTED]