2009/4/30 Dmitry Stogov <dmi...@zend.com>:
> Hi Matt,
>
> Does this patch fix EOF handling issues related to mmap()? (e.g. parsing of
> files with size 4096, 8192, ...). Now we have two dirty fixes to handle them
> correctly.
>
> The patch is quite big to understand it quickly. I'll probably take a look
> on weekend.
>
> -ANY_CHAR [^\x00]
> +ANY_CHAR [^]
>
> Is [^] a correct regular expression?
>
> Thanks. Dmitry.
>
> Matt Wilmas wrote:
>>
>> Hi Dmitry, Brian, all,
>>
>> Here's a scanner patch that I mentioned awhile ago, with a possible way to
>> work around the re2c EOF handling issues.
>>
>> The primary change is to do a "manual scan" like I talked about in areas
>> that match large amounts and can contain NULL bytes (strings/comments, which
>> are now scanned faster too), as is done for inline HTML.  I called it a
>> "diet" :-) because it removes my complicated string regex patterns from a
>> couple years ago, which doesn't make the .l file much smaller after adding
>> the manual scan code (easier to understand...?), but it does result in a
>> ~34k reduction of 5.3's generated .c file...
>>
>> This fixes Bug #46817, as well as a better, more proper fix for the older
>> Bug #42767, both related to ending comments.
>>
>> Now inline HTML chunks aren't broken up when a tag starting with "s" is
>> encountered (<script> for JS, <span>, etc.), since it's unlikely to be a
>> long PHP <script> tag.
>>
>> If an opening PHP <SCRIPT> tag was used with a capital "S", it was missed
>> if it wasn't the first thing scanned:
>>
>> var_dump(token_get_all("HTML... <SCRIPT language=php>"));
>>
>> Single-line comments with a Windows newline didn't include the full \r\n:
>>
>> var_dump(token_get_all("<?php // Comment\r\n?>"));
>>
>> Finally, part of the optimized scanning is that, for double quoted
>> strings, when the first variable is encountered (making it non-constant),
>> the amount that's been scanned up to that point is remembered, which can
>> then be skipped over (up to the variable) after returning the quote token.
>> Previously that initial part of the string was rescanned -- the cost
>> dependent on how far "into" the string the first var is.
>>
>>
>> I think that's about all --  I'll send another message if I forgot to
>> mention anything...  Just wanted to send this along quick for to you guys to
>> look at or whatever.  It was basically done last week, I just had to do a
>> couple finishing touches and verify that everything was OK.
>>
>> http://realplain.com/php/scanner_diet.diff (Merged changes, but didn't
>> test yet.)
>> http://realplain.com/php/scanner_diet_5_3.diff
>>
>>
>> Thanks,
>> Matt
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

Hmm. RegexBuddy explains that as ...

Match any character that is NOT a “]”

But it sure doesn't look valid without a closing ].



-- 
-----
Richard Quadling
Zend Certified Engineer : http://zend.com/zce.php?c=ZEND002498&r=213474731
"Standing on the shoulders of some very clever giants!"

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to