begs the bigger question, why doesn't the world just settle on UTF-8 and be
done with this encoding nonsense once and for all?  It continues to drive
me crazy and i've been writing code for over 30 yrs!

On Tue, Feb 12, 2013 at 4:10 AM, Lee Badham <[email protected]> wrote:

> Hi,
>
> I've been trying to use the Universal Encoding detection plugin to test
> for encoding of files we import because we never know their encoding.
>
> Most of the time it gets it right, but sometimes it does not.
>
> Once the encoding is guessed, the strings are then converted to UTF-8 to
> be stored in a PostgreSQL database which complains if there is an invalid
> UTF-8 sequence.
>
> So how can I check for a valid UTF-8 string?
>
> The solutions I've found online involve using Regex to check for a certain
> byte sequence.
>
> I can't get this to work at all properly using either the built in Regex,
> or RegexMBS.
>
> if re.Compile("\xEF") then
>
> Does not find the hex value EF in the string, even though when I look at
> the byte of the string from the IDE it is there.
>
> What am I doing wrong?
>
> Lee Badham
>
> www.bodoni.co.uk | www.presssign.com
>
>
> _______________________________________________
> Mbsplugins_monkeybreadsoftware.info mailing list
> [email protected]
>
> https://ml01.ispgateway.de/mailman/listinfo/mbsplugins_monkeybreadsoftware.info
>



-- 
---------------------------------------------
Peter K. Stys, MD
Dept. of Clinical Neurosciences
Hotchkiss Brain Institute
University of Calgary
tel (403) 210-8646
---------------------------------------------
_______________________________________________
Mbsplugins_monkeybreadsoftware.info mailing list
[email protected]
https://ml01.ispgateway.de/mailman/listinfo/mbsplugins_monkeybreadsoftware.info

Reply via email to