Geoffrey Sneddon wrote: > Yeah, I started an entire Unicode implementation in userland PHP. > Let's just say it became rather large while getting nowhere. :)
So, the trick is to only use strpos() on well-formed UTF-8, and you're golden. :-) > This isn't really a case of the built-in implementation not working, > it's just the built-in implementation is defined to use either UCS2 or > UCS4 depending on a compile-time flag, which can end up being rather > fun to deal with (look at ifragment in anolislib/utils.py in Anolis > for example). That an absolutely horrid piece of code, having to match for surrogate pairs yourself. Does Python use PCRE, by any chance? Cheers, Edward --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "html5lib-discuss" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB -~----------~----~----~----~------~----~------~--~---
