Re: [PHP-DEV] RC2 and integer/float handling in 5.3

Matt Wilmas Mon, 06 Apr 2009 18:51:12 -0700

Hi again Brian,

----- Original Message -----
From: "shire"
Sent: Monday, April 06, 2009


Hey Matt,

Matt Wilmas wrote:

Yep, 5.3's snapshot self-compiled from a couple days ago on Windows (not
that that should matter). (I'm not regenerating it with re2c, which also
shouldn't matter; using the existing .c file. I haven't touched the
scanner stuff in a long time (yet) to regen.) Scanner of course hasn't
changed since then.


Here's what I'm currently doing (more or less with some changed paths):
[...]
  [1]=>
  array(3) {
    [0]=>
    int(366)
    [1]=>
"   string(57) "// this comment and trailing blank contain windows CR+LF
    [2]=>
    int(2)
  }

As a side note, I just noticed that the full Windows newline (\r\n, CR+LF)isn't getting taken with the comment (\n included in WHITESPACE after), as*nix's \n does. See the " before string(57)? Because the CR is resettingthe line I guess, without going to the next. It's this rule:


<ST_ONE_LINE_COMMENT>[^\n\r?%>]*{ANY_CHAR} {

that's only matching the \r before returning T_COMMENT. Simple enough tofix as well, but I hadn't spotted that one before until I was trying to seewhy that quote was out-of-place. :-) (This isn't new in 5.3 though...)

  [2]=>
  array(3) {
    [0]=>
    int(371)
    [1]=>
    string(3) "

"
    [2]=>
    int(2)
  }
}


The newlines look like this in the second file:

<?php$
// this comment and trailing blank contain windows CR+LF^M$
^M$

Unfortunately I can't test on a windows build, perhaps you could re-test
or share your reproduction that fails as this seems to work for me unless
I'm of course missing some difference.

Test case is the one in the bug report. :-) Last token is not the
comment, but whitespace.


There are two reproductions in the bug report ;-)

Oops, forgot about the second one -- I meant the first in the initialreport. The part I'm talking about is: "It only seems to occur if thereisn't a newline behind the comment." So the easiest way to see is simply:


var_dump(token_get_all('<?php // test'));

array(1) {
 [0]=>
 array(3) {
   [0]=>
   int(368)
   [1]=>
   string(6) "<?php "
   [2]=>
   int(1)
 }
}

Also, the unterminated comment Warning is still missing with "<?php /*
blah " like it's been since the re2c change (except maybe for the time
your fix was applied). My changes would clean this up of course, unless
you do something first.


I think fixing this would be great as well as the other highlighter test
that was changed.  I would just prefer that the scanner handle these
rather than us implementing what is essentially a hand-written scanner
within the lexer file.

Yeah, I remember you said that last time. :-) But like the inline HTMLscanner part you mentioned then, if it's pretty simple to implementmanually, I thought it seemed logical (I don't know if that stuff waspossible with how flex worked; it was only after seeing the HTML scanningthat I thought, "Ah.") The regex would've generated more code, and probablywouldn't make much difference for readability...? (I still wonder if itwasn't used because it wouldn't work with the re2c issues otherwise.) Withthe string, etc. scanning, my regular expressions are pretty complicated, tomatch stuff that isn't very complicated, which generates a LOT of code, andprobably aren't that readable or easy to understand, even with the comments.Well anyway, if I do something I'll send it along for analysis!


-shire

- Matt


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] RC2 and integer/float handling in 5.3

Reply via email to