[^] is a special case to write a portable match any character in re2c.

Scott

Dmitry Stogov wrote:
> Hi Matt,
> 
> Does this patch fix EOF handling issues related to mmap()? (e.g. parsing
> of files with size 4096, 8192, ...). Now we have two dirty fixes to
> handle them correctly.
> 
> The patch is quite big to understand it quickly. I'll probably take a
> look on weekend.
> 
> -ANY_CHAR [^\x00]
> +ANY_CHAR [^]
> 
> Is [^] a correct regular expression?
> 
> Thanks. Dmitry.
> 
> Matt Wilmas wrote:
>> Hi Dmitry, Brian, all,
>>
>> Here's a scanner patch that I mentioned awhile ago, with a possible
>> way to work around the re2c EOF handling issues.
>>
>> The primary change is to do a "manual scan" like I talked about in
>> areas that match large amounts and can contain NULL bytes
>> (strings/comments, which are now scanned faster too), as is done for
>> inline HTML.  I called it a "diet" :-) because it removes my
>> complicated string regex patterns from a couple years ago, which
>> doesn't make the .l file much smaller after adding the manual scan
>> code (easier to understand...?), but it does result in a ~34k
>> reduction of 5.3's generated .c file...
>>
>> This fixes Bug #46817, as well as a better, more proper fix for the
>> older Bug #42767, both related to ending comments.
>>
>> Now inline HTML chunks aren't broken up when a tag starting with "s"
>> is encountered (<script> for JS, <span>, etc.), since it's unlikely to
>> be a long PHP <script> tag.
>>
>> If an opening PHP <SCRIPT> tag was used with a capital "S", it was
>> missed if it wasn't the first thing scanned:
>>
>> var_dump(token_get_all("HTML... <SCRIPT language=php>"));
>>
>> Single-line comments with a Windows newline didn't include the full \r\n:
>>
>> var_dump(token_get_all("<?php // Comment\r\n?>"));
>>
>> Finally, part of the optimized scanning is that, for double quoted
>> strings, when the first variable is encountered (making it
>> non-constant), the amount that's been scanned up to that point is
>> remembered, which can then be skipped over (up to the variable) after
>> returning the quote token. Previously that initial part of the string
>> was rescanned -- the cost dependent on how far "into" the string the
>> first var is.
>>
>>
>> I think that's about all --  I'll send another message if I forgot to
>> mention anything...  Just wanted to send this along quick for to you
>> guys to look at or whatever.  It was basically done last week, I just
>> had to do a couple finishing touches and verify that everything was OK.
>>
>> http://realplain.com/php/scanner_diet.diff (Merged changes, but didn't
>> test yet.)
>> http://realplain.com/php/scanner_diet_5_3.diff
>>
>>
>> Thanks,
>> Matt
> 

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to