On Wed, Mar 11, 2009 at 5:04 AM, Linly <[email protected]> wrote:
>
> I've thinking that the %-encoded code should ONLY saved in one place -
> the file name in file system. Other then that, inside the content of a
> page, no matter it is interwiki link, info data, index or anything, it
> should save as normal utf-8 texts.
>
> While the interwiki link is rendered to html link, %-encoding is
> executed at this moment.
>
> Is my understanding right or wrong?


Thank you very much for your testing. This really helps. It also shows
as Hans pointed out, we have taken kind of a patchwork approach to
slowly adding UTF support. Mostly because I had no clue how to do it
initially. :)

The answer to your question is "probably".

First we need to really think about the best place to encode/decode
the utf chars so that we have maximum, and simplest functionality.
Switching the decoding to the final output of the markup table (as
Hans suggested) will simplify a lot of code and make sure we never see
a %encoded url anywhere (unless escaped--a whole other issue). And
while probably don't want the underlying page content %encoded (for
those who snoop around in there), there are some real possibilities if
we did that could make UTF page names work anywhere in the system. For
example, there's no reason we couldn't  define some custom system var
like {二} and get it to work. Or a function like [(二 ....)]. (We might
already be able to do this with mapping). Crazy possibilities.

However, I have two concerns.

1) Security. Though I haven't yet tested, I'm concerned someone could
url_encode a XSS hack, drop it into any BoltWire comment box, and
wreak havoc. It would bypass all filters (I have had to add %'s now to
most filters to admit page names), and then if BoltWire blindly
decoded everything, it could output perfectly formed javascript to the
page. This may already be a vulnerability if you have the new utf
pagenames enabled.

2) Filters. We may have real issues when all our filters becoming
essentially meaningless, not to mention problems with markup rules.
The purpose of filters is to validate and clean user input, and block
disruptive or malicious code from being entered. But it kind of
defeats the whole idea, if anything can be entered anywhere, it gets
urlencoded, and it sails past the filters with ease. Fun but scary.

I've also been really busy, and will be all this week. So the next
release may not be out till next week. Unless I can squeeze in a
couple quick fixes. It probably won't be till next week that I can get
some time to think through a more systemic way to integrate our new
UTF capabilities into BoltWire.

Cheers,
Dan

P.S. Input, particularly on the theoretical level right now, would be
really helpful. I'm almost persuaded by the recent posts in support of
making utf default, but still dragging my feet because of the
potential risks involved.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"BoltWire" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/boltwire?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to