Re: Unicode regex matches 's' and 'k'

jj Fri, 30 Apr 2021 13:18:20 -0700

Hi Sven,

Is it possible that you did a case insensitive search (the "Case sensitive" 
check box was unchecked in the Find window)?
In this case it is not a bug but simply Unicode case conversion, your regex 
 finds the "lowercase" version of this two Unicode character:


Unicode Character “K” (U+212A) is based on k
https://www.compart.com/en/unicode/U+212a

Unicode Character “ſ” (U+017F) is based on s
https://www.compart.com/en/unicode/U+017f

If you try you regex with "Case sensitive" checked, 's' and 'k' are not 
found because there is no case conversion.

Case conversion + Unicode + Locales can be tricky.

Regards,

Jean Jourdain
On Friday, April 30, 2021 at 2:16:20 PM UTC+2 [email protected] wrote:

> Hi!
>
> I was going to run a regular expression on a large document.
> What I wanted to extract was lines matching [\x{007f}-\x{ffff}], also 
> known as high or extended ASCII.
>
> When I search for that pattern in the document, however, it also oddly 
> matches the characters "s" and "k", which according to the Character 
> inspector have Unicode 0073 and 006B respectively.
>
> Am I doing something wrong here? It seems to me this could be a bug.
>
> I'm at BBEdit 13.5.6.
>
> Best regards,
> Sven
>

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or need technical support, please email "[email protected]" rather than 
posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bbedit/6e5c63ac-0cc2-47b5-945a-ea93a63c33fdn%40googlegroups.com.

Re: Unicode regex matches 's' and 'k'

Reply via email to