On Tue, Mar 10, 2009 at 6:22 AM, Hans <[email protected]> wrote:
>
> 2009/3/10 Linly <[email protected]>:
>>
>> I have set "utfPages: true" in "site.config". Don't know this is the
>> reason or not.
>
> Yes, that did the trick.
>
> I find the term 'utfpages' confusing. All web pages are utf-8 encoded
> as standard via declaration in the skin HTML head. Thus using utf-8
> text in the content displays okay.
>
> But 'utfpages: true' refers to utf-8 encoded page names only.
> 'utfPageNames: true'  would be  a lot clearer as a  config switch.
> And i think this should be true by default, same as utf-8 character
> decoding is set by default via the skin HTML head.
> ASCII text in page names appears as normal, as it is the utf-8
> standard way to render the lower ASCII characters as normal
> characters.
>
> The only issue is if Western European users wish to have their
> diacritic characters transformed to non-diacritic lower ascii
> characters. For this one should use a config variable, but not for
> generally using utf-8 in page names.
> Perhaps a var called simpleAsciiPageNames or basicAsciiPageNames

Thanks again for the extremely fast testing and bug reporting. I'll
get to working on some fixes to what's been reported so far and try
and issue a release later today or tomorrow. I have a lot of catchup
on other stuff to do, but most of these don't seem too difficult to
resolve.

Plus, we may get better results thinking about it just a bit first.
I'm leaning toward trying to do all the conversion at the very last
minute, as it should solve many problems. But that might cause
problems with the links, and perhaps other things. Maybe none at all.
Will just take some experimentation. Fortunately a change like this
will be under the hood and not disrupt sites, which is my biggest
concern right now. Getting something stable for folks like Linly to be
able to begin using.

As for the config setting, I have no problem with changing utfpages to
utfpagenames, though I'm not convinced it adds that much semantically.
We use the variable {page} to represent the page (name), and it's
never been confusing so far as I know. Similarly, when scrolling down
my pages dir, I don't see anything difficult about seeing a page as
either ascii or utf8 depending on wheter it's name is some.page or
%e2%a7.etc.  Still, it's just a question of changing the parameter.

The bigger issue, perhaps is whether or not it should be the default
setting. My original plan was to only have ascii page names, and I
only hesitantly offered minimal utf page names capabilities little by
little, never quite expecting we would have what we have now. And it
never my intention to make it the default behavior. Just an option I
figured we'd stretch as far as we could without breaking things.

But like the conversion to utf encoding, once we got the combination
right, it turned out to be simpler than expected, and quite cool. So
I'm open to considering the possibility of changing the default
setting. Just still hesitant.  It does have the nice advantage of
titles not being stripped of diacritics. My concerns are:

1) Latin based alphabets, as Hans mentioned. (On an earlier issue, we
could make our config setting pageNames: ascii or utf-8 with one of
them default). My guess is most users of accented latin-based letters
would prefer to be able to read a pagename like tèst as test rather
than t%e27st (or whatever it is). Creating a plugin of alternate
actions to automatically save titles with accents might be a useful
addition to this.

2) Speed. I thought the extra handling of utf conversions might affect
speed. But on my tests with utf pages so far, they seem slighter
faster. Curious.

3) Security. I am not sure what security implications there are of
allowing % encoded text in a page. For example, what if someone
encoded a javascript snippet and then cut and paste the script to a
page. If BoltWire simply unencoded it, would we be opening ourselves
to an xss hack? Likely.  (This by the way is one reason I have
everything run through a pair of central utf2url/url2utf functions--we
can easily add extra precautions if needed systemwide).

4) I've not thought through all the case sensitivity issues. In most
things BoltWire is case insensitive by design. I'm not sure how
switching over will affect this. And particularly with diacritics that
DO have case conversions. I have some code for utf case changing, but
it's not all tested etc.

5) To be honest, I personally like it in the nice safe confines of
simple ascii page names, and probably for most English sites, it is
easiest. And maybe mostly for this reason, I'm still inclined to keep
ascii the default.  :)

Cheers,
Dan

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"BoltWire" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/boltwire?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to