agreed on the 1) and 2) But how exactly do you define "adding a space before each uppercase letter that starts a word" ? How do you find this "uppercase letter that starts a word" in a pagename or link ? Can you give a few samples ?
/Harry 2009/11/2 Andrew Jaquith <[email protected]> > Ok, that makes sense. I can think of cases in English too, like > "averse" (opposed to) and "a verse" (a portion of a song or poem). I > just decided that I didn't care. :) > > But assuming we do care... > > ...what about going the other way: on import, or on page save, or page > lookup, forcibly expanding CamelCasePageNames (and inline page links) > so that they have one space in between the words? That way, > case-insensitive matching with spaces preserved (trimmed to one space) > would work. > > So, the rules would be this: > > (1) When links in pages are parsed, or page names are saved, leading > and trailing spaces will be trimmed, and all whitespace between words > will be replaced with one space character. > (2) Whitespace before and after the space name will be removed. > (3) CamelCase page links or page names will be normalize by adding a > space before each uppercase letter that starts a word > (4) Tests for page name equality are done by applying rules (1) , (2) > and (3) and making a case-insensitive comparison. > > That seems simple enough, no? > > Andrew > > On Mon, Nov 2, 2009 at 2:44 PM, Janne Jalkanen <[email protected]> > wrote: > >> Can you provide some examples where a > >> strip-the-whitespace-and-do-a-case-insensitive-comparison strategy > >> would not work, in Finnish? I'd like to understand this, seriously. > > > > E.g. "maan alle" vs "maanalle". First means "into the ground", the > > next one is "earth bear". > > > > Or "kuusi puuta" vs "kuusipuuta" - "six trees" vs "at a fir" (or "of > > fir timber"). > > > > Or simply "sivusta katsoja" vs "sivustakatsoja" - "a person who looks > > (literally) from the sides" vs "onlooker". The difference is subtler > > than with the previous ones, but the existence of the space is > > significant information. > > > > In fact, getting mixed up when two words go together and when they do > > not is one of the most common grammatical errors. Sometimes the > > results can be fairly hilarious and unintended. Often it looks just > > sad. > > > > But the point being that in Finnish (and other so-called constructed > > languages), whitespace is significant. So it should not be ignored > > arbitrarily. > > > > Besids, I am not aware of any wikiengines who would consider > > whitespace insignificant in determining pagename equality. mediawiki's > > rules concerning spaces are: > > > > <snip> > > Spaces/underscores which are ignored: > > * those at the start and end of a full page name > > * those at the end of a namespace prefix, before the colon > > * those after the colon of the namespace prefix > > * duplicate consecutive spaces > > <snap> > > > >> FYI, I took a look at JSPWiki.org to see what the scale of the problem > >> might be. The site has about 4850 pages. I yanked down all of the page > >> names and compared them. I detected exactly ONE name clash: "Text > >> formatting rulesKorean" and "TextformattingrulesKorean" appear to be > >> different pages. That is a 0.02% collision rate -- and easily handled > >> by a rename-on-import or special-page redirection strategy. > > > > That's not what I meant. I meant that we have many links of the form > > [word1 word2] embedded within running text. If we change those, then > > the running text becomes meaningless and needs to be *checked by > > hand*. > > > > /Janne > > >
