"Aaron M. Renn" <[EMAIL PROTECTED]> writes:

> What if the spec seems to be broken?

Goto http://www.nodule.demon.co.uk/java/ and file a report.  I agree
the JLS definition of StreamTokenizer is a little underspecified, but
I don't think it's broken.

> It lists various strategies for parsing, but doesn't say what order
> they should be applied in (other than white space is skipped first)
> except that I guess we could assume that we should apply the
> criteria in order (which looks wrong) or what should be done if
> conflict is encountered.

The criteria should be applied in order.

> First, there are five character attributes.  The spec says a
> character can have more than one attribute, but says nothing about
> the precedence they the various attributes have.

BTW, the spec doesn't say this, but each attribute modifier can be
called multiple times, for instance:

wordChars(32, 45); wordChars(48,52);

... would set chars 32 through 45 and 48 through 52 to have the
"alphabetic" attribute.

There's no need for precedence of attributes, if you apply the
criteria in order.

> For example, words are terminated by the first non-alpha/non-numeric
> char encountered.  This (from the example) seems to imply that a
> whitespace char terminated the word.  But what if the whitespace
> char is also an alpha char?

Any non-alpha/non-numeric char would terminate a word, so if a
whitespace char is an alpha char, then it would not terminate that
word.

> Even if a given character can only have one attribute, the spec
> still seems broke from what I can tell.

Just do the parsing, step by step (ie. paragraph by paragraph of the
JLS), and things should go ok.

> For example, a word (alpha) token is terminated by the first
> non-alpha token.  But if C++ comments are enabled, then "//" might
> not terminate the word, but instead would be included as part of the
> word.  Ditto for "/*" (C style comments).  That probably isn't what
> the user expected.

I fail to see the problem here. "/" is a non-alphabetic/non-numeric
character, so it will terminate the word.  You don't throw away the
"/", but instead parse that character on the next loop.

For example:

hey//this is a comment<EOL>

this results in the follow behavior:
1st pass: hey = identifier
2nd pass: //this is a comment<EOL> = comment

-- 
Paul Fisher * [EMAIL PROTECTED]

Reply via email to