Hi,
I've been trying to use the Universal Encoding detection plugin to test for
encoding of files we import because we never know their encoding.
Most of the time it gets it right, but sometimes it does not.
Once the encoding is guessed, the strings are then converted to UTF-8 to be
stored in a PostgreSQL database which complains if there is an invalid UTF-8
sequence.
So how can I check for a valid UTF-8 string?
The solutions I've found online involve using Regex to check for a certain byte
sequence.
I can't get this to work at all properly using either the built in Regex, or
RegexMBS.
if re.Compile("\xEF") then
Does not find the hex value EF in the string, even though when I look at the
byte of the string from the IDE it is there.
What am I doing wrong?
Lee Badham
www.bodoni.co.uk | www.presssign.com
_______________________________________________
Mbsplugins_monkeybreadsoftware.info mailing list
[email protected]
https://ml01.ispgateway.de/mailman/listinfo/mbsplugins_monkeybreadsoftware.info