On Fri, Apr 23, 2010 at 22:51, Paul Eggert <[email protected]> wrote:
> Paolo Bonzini <[email protected]> writes:
>
>> On 04/18/2010 06:32 AM, Ivan wrote:
>>> So... right now, "." means "valid UTF-8 character"? Or not?
>>
>> Yes, if your locale is UTF-8.
>
> Wouldn't it be better to model encoding errors as characters?  That is,
> if grep sees a byte that cannot possibly be the start of a character, we
> call it a "character" even though it is not in the standard Unicode
> character set.  Internally, we could model it as (say) a negative
> number, the negative of the byte value (so it would be in the range -255
> .. -128).

This would have to be changed in glibc first, and then in dfa.c.

Encoding errors in the regex are supported, but . doesn't capture an
invalid character.

Paolo


Reply via email to