Hi, Dan and Hans, Since mostly three of us walk together along the utf-8 road, I'd like to talk a little about my feeling. I'm not programmer so the technical problem is not the things I can discuss, but the sentimental side however I like to share.
The people outside English world are all familiar with English especially in url. The day Hans first time solve the problem making true Chinese characters appeared in the url bar, I was so touched that can't help to stop my tears felling down. It was happy tear. So rare, waiting so many days, BoltWire achieved this. Only few other programs can do the same. (Even PmWiki could not do it perfectly.) This make BoltWire become the advanced one in i18n around all CMSes. I'm not Christian but I always admire the missionaries who go to foreign country and use the local language to touch people. Utf-8 page name maybe not adding any function to BoltWire but it make people outside English world LOVE BoltWire. Cheers, linly On Mar 11, 12:02 am, Hans <[email protected]> wrote: > 2009/3/10 The Editor <[email protected]>: > > > 1) Latin based alphabets, as Hans mentioned. (On an earlier issue, we > > could make our config setting pageNames: ascii or utf-8 with one of > > them default). My guess is most users of accented latin-based letters > > would prefer to be able to read a pagename like tèst as test rather > > than t%e27st (or whatever it is). Creating a plugin of alternate > > actions to automatically save titles with accents might be a useful > > addition to this. > > I differ here. Web developers were forced to use basic ASCII in the > past. The growing support of Unicode with utf-8 encoding and decoding > makes it more and more anachronistic to use page names in urls with > stripped diacritics. > Like for instance the German word Öl (oil). A stripped page name like > Oel is just about readable to a German, but not friendly. and it fails > in Google searches totally. For Google you need the proper utf-8 > encoded word, either as Öl or as % encoded url. > > Also more browsers support utf-8 in their address bar: The % encoded > urlhttp://de.wikipedia.org/wiki/%C3%96lereads in Firefox > ashttp://de.wikipedia.org/wiki/Öle. So it is getting more and more > convenient to use utf-8 always, as well as more practical and search > engine friendly. > > Therefore I think that utf-8 page names should be the default. If you > only use basic Latin characters, without diacritics, and no characters > from other languages, you won't even notice. If you do, you will like > the default support of these. If you are a Western European who wants > to have diacritic characters transformed into basic Latin ones, you > should be able to set a config switch. And switching to such a > transformation mode should still show pages with the proper diacritics > in the titles. This should be automatic, and no plugin installation > needed for it. It really annoys me to enter something like Öl aspage > name for a new page, and then the page not only shows oel in the url, > but Oel as the page title, so that i immediately have to set the title > to Öl. > > > 2) Speed. I thought the extra handling of utf conversions might affect > > speed. But on my tests with utf pages so far, they seem slighter > > faster. Curious. > > Good! > > > 3) Security. I am not sure what security implications there are of > > allowing % encoded text in a page. For example, what if someone > > encoded a javascript snippet and then cut and paste the script to a > > page. If BoltWire simply unencoded it, would we be opening ourselves > > to an xss hack? Likely. (This by the way is one reason I have > > everything run through a pair of central utf2url/url2utf functions--we > > can easily add extra precautions if needed systemwide). > > I don't think this is an issue. Need to do some testing though. > > > 4) I've not thought through all the case sensitivity issues. In most > > things BoltWire is case insensitive by design. I'm not sure how > > switching over will affect this. And particularly with diacritics that > > DO have case conversions. I have some code for utf case changing, but > > it's not all tested etc. > > There may be some issues here. Hopefully not too big ones to tackle! > > > 5) To be honest, I personally like it in the nice safe confines of > > simple ascii page names, and probably for most English sites, it is > > easiest. And maybe mostly for this reason, I'm still inclined to keep > > ascii the default. :) > > Well, that sounds very US centric ;-)). Time to embrace the world! > Even Canada and most European countries needed to deviate from the US > ASCII code standard and create their own versions to accommodate > diacritic characters etc. (before Unicode was developed). > > Now with Unicode we can be much broader and step outside the > confinements of basic ASCII. > Just think of basic ASCII as a specialist subset of Unicode, and write > the program code for the general not the specialist case. > > If you followed my posts about this you noticed that i was rather > sceptical that it would be doable. I thought it needed a different > approach, not just fixing things here and there. Well, we got quite > far with fixing things, so why not sail with it? > > cheers, > ~Hans --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "BoltWire" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/boltwire?hl=en -~----------~----~----~----~------~----~------~--~---
