Of course - delimiters is not part of the string length - I see now why you can have (in theory) unbound prefix/suffix.

Personally, I find the argument - "because you can have unlimited-length identifiers" not a great fit. From a lexer writer perspective, I can see that it is used as a candidate - after all it is a token whose size is unbound. But I find it hard to ignore that the roles played by identifiers and delimiters in the grammar are quite different.

At least there were other cases were we found different trade off between expressiveness and practicality - see Project Coin's use of repeated underscores in binary literals (subsequently banned):

private static final int BOND =
 0000_____________0000________0000000000000000__000000000000000000+
 00000000_________00000000______000000000000000__0000000000000000000+
  000____000_______000____000_____000_______0000__00______0+
 000______000_____000______000_____________0000___00______0+
0000______0000___0000______0000___________0000_____0_____0+
0000______0000___0000______0000__________0000___________0+
0000______0000___0000______0000_________0000__0000000000+
0000______0000___0000______0000________0000+
 000______000_____000______000________0000+
  000____000_______000____000_______00000+
   00000000_________00000000_______0000000+
     0000_____________0000________000000007;

(Example courtesy of Joshua Bloch)

Maurizio


On 26/02/18 21:54, Jim Laskey wrote:
Why introduce an artificial limit? Identifiers don’t have a limit. 3.8. Identifiers An identifier is an *unlimited-length sequence* of Java letters and Java digits, the first of which must be a Java letter.

— Jim

On Feb 26, 2018, at 5:29 PM, Maurizio Cimadamore <maurizio.cimadam...@oracle.com <mailto:maurizio.cimadam...@oracle.com>> wrote:



On 26/02/18 20:17, John Rose wrote:
Any*finite choice*  of end-quotes has the same problem, with
a non-zero probability that decreases (but does not vanish)
with the number of available end-quotes.  The only way to
break out of the box is to allow the user an unlimited range
of successively "stronger" end-quotes (i.e., less likely ones).
In reality there is a 'finite' upper bound for this length, which is given by 2^16 /2 = 2 ^ 15. That's the maximum delimiter size you could encode in a Java String which you can also symmetrically close - and it's an edge case, as it will contain the empty string.

So, yes, on paper, I agree with the argument, in practice, I guess I'd me more in favor of limiting the number of repetitions - I wouldn't like to open the door to puzzlers:

`````````````````````````````````````````````````````````````````````````hello`````````````````````````````````````````````````````````````````````````

(which might leave some Ascii art lovers a bit unhappy :-))

I think limiting to 8 or some other reasonable small number will probably reduce the clash probability enough? And, even if it's not enough, I guess we'd still be left with the question if a long (possibly unbounded?) escaping sequence is something we'd like to see in Java.

Maurizio


Reply via email to