Hi,

I've been trying to use the Universal Encoding detection plugin to test for 
encoding of files we import because we never know their encoding.

Most of the time it gets it right, but sometimes it does not.

Once the encoding is guessed, the strings are then converted to UTF-8 to be 
stored in a PostgreSQL database which complains if there is an invalid UTF-8 
sequence.

So how can I check for a valid UTF-8 string?

The solutions I've found online involve using Regex to check for a certain byte 
sequence.

I can't get this to work at all properly using either the built in Regex, or 
RegexMBS.

if re.Compile("\xEF") then

Does not find the hex value EF in the string, even though when I look at the 
byte of the string from the IDE it is there.

What am I doing wrong?

Lee Badham

www.bodoni.co.uk | www.presssign.com


_______________________________________________
Mbsplugins_monkeybreadsoftware.info mailing list
[email protected]
https://ml01.ispgateway.de/mailman/listinfo/mbsplugins_monkeybreadsoftware.info

Reply via email to