S. Isaac Dealey wrote: > > This brings up a good question and something I've been thinking about for > several months. At some point I plan to ( or expect I will be asked to ) > provide support for multiple languages / internationalization in my content > management system. > > How will character sets from the client browser affect the form or url > submissions?
Always inconsistent and always wrong. :) > Is it a good idea to simply declare a default character set on every page > and then perform some sort of conversion when necessary for alternate > languages? Will this make the app more stable or will it only make it more > difficult to support other character sets? First, disconnect the ideas of charset and language. ISO 8859-15 (a.k.a. LATIN9) for instance has a charset that is sufficient for most West European languages. On the other hand, if you want to properly support Japanese you need to support Hiragana, Katakana and Kanji (maybe they are integrated into one package such as SHIFT-JIS, but you get the point). So first you have to determine whether you actually need different charsets, or just different languages. If you just need different languages from the same charset, use that charset and all you have to worry about is making sure you have a translator for the content. If you need different charsets, you have a problem. It is not possible to have multiple charsets on one page. The solution to this mess is unicode. Unicode is designed from the ground up to be the charset has ALL characters. Every character from every charset. The charset that will end all charsets :) One of the funny results is that unicode has over 20 whitespace characters, and all have a different meaning. But it does work, and all characters are in unicode (OK, maybe not Summerian nail-writing from 4000 BC, but if not they are certainly working on it). So how to use a specific charset? CF MX internally is no problem. It will use the charset the templates are in (if detected by the BOM) or the system locale. You can override this by using <cfprocessingdirective> for each template. Databases might or might not be a problem. Many will require you to use N-type fields (N = national = SQL-92 name) if you want to use multi-byte characters. Some will just fail. Check the documentation for specifics (don't forget to look for a Translate() function, which is a new SQL:99 function that could translate from one charset to another if implemented). Read, read, read. Test, test, test. Then of course there is the issue of the database drivers supporting the required charsets. For instance, the Access drivers that come with MX will not support Unicode. Forget about the webserver, it is not important for this. So we get to the browser. First thing is that you have to tell the browser what exactly you are sending to it. Use cfcontent for that, it has the highest priority of all options. [1] This should solve all issues with characters being displayed incorrectly in the browser. If I am not mistaken, if you see question marks, it means that the font does not have the approriate glyph and if you see a square, the character is not present in that charset (in which case it is time to check if your browser is on auto-detect and run your HTML through a validator such as http://validator.w3.org/). (A safe font used to be Arial Unicode which has a very large collection of glyphs, but in the neverending push for revenue this font is no longer available for download from the MS website and is only distributed together with Office. If somebody happens to have a copy of the install file, please mail me off-list.) Last is the data being returned from the browser. Use the setEncoding() function to specify the correct charset for it. It is possible for browsers to break this on purpose. Typical case of "garbage in, garbage out", if you deliberately overrule the charset (in your browser under View) you can send something to the server that the server doesn't expect. I am sure some people have lots to add, but I think these are the basics. [1] http://www.w3.org/International/O-charset.html Links of interest: http://www.macromedia.com/support/coldfusion/internationalization.html http://www.unicode.org/ ftp://ftp.isi.edu/in-notes/rfc2277.txt Jochem ______________________________________________________________________ Signup for the Fusion Authority news alert and keep up with the latest news in ColdFusion and related topics. http://www.fusionauthority.com/signup.cfm FAQ: http://www.thenetprofits.co.uk/coldfusion/faq Archives: http://www.mail-archive.com/[email protected]/ Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists

