begs the bigger question, why doesn't the world just settle on UTF-8 and be done with this encoding nonsense once and for all? It continues to drive me crazy and i've been writing code for over 30 yrs!
On Tue, Feb 12, 2013 at 4:10 AM, Lee Badham <[email protected]> wrote: > Hi, > > I've been trying to use the Universal Encoding detection plugin to test > for encoding of files we import because we never know their encoding. > > Most of the time it gets it right, but sometimes it does not. > > Once the encoding is guessed, the strings are then converted to UTF-8 to > be stored in a PostgreSQL database which complains if there is an invalid > UTF-8 sequence. > > So how can I check for a valid UTF-8 string? > > The solutions I've found online involve using Regex to check for a certain > byte sequence. > > I can't get this to work at all properly using either the built in Regex, > or RegexMBS. > > if re.Compile("\xEF") then > > Does not find the hex value EF in the string, even though when I look at > the byte of the string from the IDE it is there. > > What am I doing wrong? > > Lee Badham > > www.bodoni.co.uk | www.presssign.com > > > _______________________________________________ > Mbsplugins_monkeybreadsoftware.info mailing list > [email protected] > > https://ml01.ispgateway.de/mailman/listinfo/mbsplugins_monkeybreadsoftware.info > -- --------------------------------------------- Peter K. Stys, MD Dept. of Clinical Neurosciences Hotchkiss Brain Institute University of Calgary tel (403) 210-8646 --------------------------------------------- _______________________________________________ Mbsplugins_monkeybreadsoftware.info mailing list [email protected] https://ml01.ispgateway.de/mailman/listinfo/mbsplugins_monkeybreadsoftware.info
