Sorry, my last post was to the wrong thread. I'll move it over and respond here:
On Wed, Oct 21, 2009 at 12:47 PM, Hans <[email protected]> wrote: > >> Note: you can already enter any of those special chars directly into >> your browser and/or a link and they work fine, thanks to UTF. > > Yes, I said so. It is not as if we are talking about illegal characters. I know I was just pointing out we are not really discriminating against greek letter users. All languages (including chinese and greek) are treated exactly the same. All we are really doing is saying you can't create a page name using html entities. You have to use the actual symbol. While it is not so convenient to enter the html entities on a page, save it, then cut and paste the symbols to your url bar or page name input field, it is easy enough to do. So we also have an easy workaround in place >> For instance I just created a page like this: test.Ξ⇔—© (Greek >> letter, mapping mark, dash, and symbol) and it all works fine. The >> only thing we are talking about are special characters (mostly >> punctuation) that tend to produce problems in the core. > > No. I don't understand why you insist we are talking about "mostly > punctuation" characters. This is not true. I gave links to pages with > lists of HTML 4 entities. And we are not talking about characters > which "produce problems in the core", I think. As page name characters > they are url encoded. And characters like <>"# and a few more are not > allowed and filtered out already. The only characters you can't use are those in the $BOLTutfEscapeChars, which are all punctuation. Everything else gets url encoded and is allowed. I can put anything in a url or link except those chars, because they otherwise get url encoded and pass the filter. So while I can't use © there is nothing stopping me from using ©. The real question seems to be more whether or not we want to allow direct submission of html entities into page names/address bars and how we should handle those. Or to put it differently, it's really not about whether we can use special chars (we can), it's just whether or not we want to allow another mechanism for introducing them, and perhaps expanding the possibilitiies to also include somehow the utfEscapeChars. >> Of course you cannot currently enter htmlentities for these chars and >> have it work. You have to enter the actual symbol. So that might be >> something worth pursuing. I'm just not convinced--as I much prefer >> what you have go in come out. Not be automatically translated more >> than necessary. I'll try and work on this some later today, but I have >> a busy schedule today... > > Okay, I try to convince you: I appreciate the principle of "what goes > in comes out", but although we can enter symbols covered as html > entities directly, a) it may not always be convenient, and b) we > cannot use html entities as code in page names, as the ampersand is a > special HTML character, and used as argument separator in urls. > > So we have the situation where we can enter HTML entities in the page > content, and have these saved as they are, as code, and displayed > decoded in a normal page, but we cannot do or expect to do the same > for page names. Instead we can agree that HTML entities get translated > to url % codes, and that way they get displayed decoded, as symbols > etc. in the address bar as well as in page lists, messages etc. Ok, took me a minute to figure out what you are saying, but it seems to be that the way we handle page names is not the same as the way we handle page content. That clicks with me. Of course with page content we retain the html entity in the source. There is no original source for a page name--so it is different in at least some regards. Here is another example of a difference. Suppose we fix boltwire so it can take a page name like page.a<b, change it to page.a<b and then urlencode it and get it to work. (Hopefully without opening any unanticipated security vulnerabilities, somewhere in the process of course.) But then I couldn't put page.a<b in the browser address bar to create a new page, or go to it, because the browser will interpret that as page.a. And that seems to be something hardcoded in the browser... In other words, we have an inconsistency in that you can enter something in a create pagename field, but not that same string in the url bar. Don't like that. Of course the flip side is also true: I can put [[page.©]] and [[page.©]] on a page and both are identical. But there is a huge difference between page.© and page.© when it comes to my pagename input field. So we're inconsistent there... Which is your point of course. If the only problem is the input fields, (I don't think we can solve the address bar issue), doesn't it make sense to add a single line (maybe in BOLTXtarget, or BOLTpageshortcuts) that automatically decodes any html entities in a pagename--before it is filtered? So it would be essentially as if you had entered the correct characters? This way we still block our dozen or so problem characters, but not any other characters... And no worries about any other changes in the core code... What do you think of this idea? In summary, 1) I am concerned about possible security vulnerabilities by allowing risky chars in page names. 2) I am worried about possible bugs, such as special chars being entered that have pageshortcut meanings, and confusion with get variables. 3) I am concerned that certain pages could be created via a form but not created via the url bar. I realize however, there's probably nothing we can do about the url bar, either way. I can live with the fact the html entity I type in, comes back out differently as a page name, after all that's what happens with page content. And if an html entity is entered, it was likely intentional. I don't sense a great burden to allow the entry of punctuation into page names, and feel we have adequate workarounds for virtually any other letter or symbol. But I do understand the issue is really whether or not we can inject these other symbols via html entities into page names, rather than requiring them to be entered directly as symbols. And that in some cases the latter might be inconvenient. I'll try your fixes tomorrow, and maybe take a couple stabs at my idea and see what comes out. Thanks for challenging my thinking as usual to explore all the possibilities of BoltWire... Cheers, Dan --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "BoltWire" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/boltwire?hl=en -~----------~----~----~----~------~----~------~--~---
