What if the spec seems to be broken?  I'm thinking of the supremely brain
dead StreamTokenizer.  It lists various strategies for parsing, but doesn't
say what order they should be applied in (other than white space is skipped
first) except that I guess we could assume that we should apply the criteria
in order (which looks wrong) or what should be done if conflict is
encountered.

First, there are five character attributes.  The spec says a character can
have more than one attribute, but says nothing about the precedence they the
various attributes have.  For example, words are terminated by the first
non-alpha/non-numeric char encountered.  This (from the example) seems to
imply that a whitespace char terminated the word.  But what if the
whitespace char is also an alpha char?

Second, then the parsing criteria states to do the following, though not the
order.  This is the order the JLS lists them in though.

-- Skip whitepace (explicitly stated as the first thing to do)
-- Check numeric
-- Check alpha
-- Check comment
-- Check quote
-- Check C++ comment
-- Check C comment
-- Default

Even if a given character can only have one attribute, the spec still seems
broke from what I can tell.  For example, a word (alpha) token is terminated
by the first non-alpha token.  But if C++ comments are enabled, then "//"
might not terminate the word, but instead would be included as part of the
word.  Ditto for "/*" (C style comments).  That probably isn't what the user
expected.

Any insights?

-- 
*****************************************************
* Aaron M. Renn                                     *
* Email: [EMAIL PROTECTED]                      *
* Homepage: <URL:http://www.urbanophile.com/arenn/> *
*****************************************************

Reply via email to