2009/3/10 The Editor <[email protected]>:

> 1) Latin based alphabets, as Hans mentioned. (On an earlier issue, we
> could make our config setting pageNames: ascii or utf-8 with one of
> them default). My guess is most users of accented latin-based letters
> would prefer to be able to read a pagename like tèst as test rather
> than t%e27st (or whatever it is). Creating a plugin of alternate
> actions to automatically save titles with accents might be a useful
> addition to this.

I differ here. Web developers were forced to use basic ASCII in the
past. The growing support of Unicode with utf-8 encoding and decoding
makes it more and more anachronistic to use page names in urls with
stripped diacritics.
Like for instance the German word Öl (oil). A stripped page name like
Oel is just about readable to a German, but not friendly. and it fails
in Google searches totally. For Google you need the proper utf-8
encoded word, either as Öl or as % encoded url.

Also more browsers support utf-8 in their address bar: The % encoded
url http://de.wikipedia.org/wiki/%C3%96le reads in Firefox as
http://de.wikipedia.org/wiki/Öle. So it is getting more and more
convenient to use utf-8 always, as well as more practical and search
engine friendly.

Therefore I think that utf-8 page names should be the default. If you
only use basic Latin characters, without diacritics, and no characters
from other languages, you won't even notice. If you do, you will like
the default support of these. If you are a Western European who wants
to have diacritic characters transformed into basic Latin ones, you
should be able to set a config switch. And switching to such a
transformation mode should still show pages with the proper diacritics
in the titles. This should be automatic, and no plugin installation
needed for it. It really annoys me to enter something like Öl aspage
name for a new page, and then the page not only shows oel in the url,
but Oel as the page title, so that i immediately have to set the title
to Öl.

> 2) Speed. I thought the extra handling of utf conversions might affect
> speed. But on my tests with utf pages so far, they seem slighter
> faster. Curious.

Good!

> 3) Security. I am not sure what security implications there are of
> allowing % encoded text in a page. For example, what if someone
> encoded a javascript snippet and then cut and paste the script to a
> page. If BoltWire simply unencoded it, would we be opening ourselves
> to an xss hack? Likely.  (This by the way is one reason I have
> everything run through a pair of central utf2url/url2utf functions--we
> can easily add extra precautions if needed systemwide).

I don't think this is an issue. Need to do some testing though.

> 4) I've not thought through all the case sensitivity issues. In most
> things BoltWire is case insensitive by design. I'm not sure how
> switching over will affect this. And particularly with diacritics that
> DO have case conversions. I have some code for utf case changing, but
> it's not all tested etc.

There may be some issues here. Hopefully not too big ones to tackle!

> 5) To be honest, I personally like it in the nice safe confines of
> simple ascii page names, and probably for most English sites, it is
> easiest. And maybe mostly for this reason, I'm still inclined to keep
> ascii the default.  :)

Well, that sounds very US centric ;-)). Time to embrace the world!
Even Canada and most European countries needed to deviate from the US
ASCII code standard and create their own versions to accommodate
diacritic characters etc. (before Unicode was developed).

Now with Unicode we can be much broader and step outside the
confinements of basic ASCII.
Just think of basic ASCII as a specialist subset of Unicode, and write
the program code for the general not the specialist case.

If you followed my posts about this you noticed that i was rather
sceptical that it would be doable. I thought it needed a different
approach, not just fixing things here and there. Well, we got quite
far with fixing things, so why not sail with it?

cheers,
~Hans

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"BoltWire" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/boltwire?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to