On Sat, Oct 16, 2010 at 10:31:47PM +0200, Andreas Kalsch wrote:
> I agree with
>
> whitespace - this can be very confusing
>
> "="
>
> To add:
>
> Make keys lowercase (or even remove diacritics), because keys are always 
> simple names.

Thats a different issue. I agree that keys normally should be lowercase. But 
thats
just a nice convention. There might be good reasons for uppercase keys, for 
instance
when the key name was used in upper case in some other system where data was 
imported
from.

There is no need to force people into a convention here. Thats different from 
the issue
I have been talking about where there are real problems with some characters.

Jochen

>
> Am 16.10.10 20:44, schrieb Jochen Topf:
>> Hi!
>>
>> I am currently fighting some issues where tags with strange characters in 
>> them
>> need to be represented in a URL for Taginfo. Lots of other websites probably
>> will have similar issues. Characters like /, ?,&, etc. have special meaning
>> in URLs so if they appear in tags I can't have those tags in URLs. Sometimes
>> escaping characters as %XX helps, sometimes not. And those problems are not
>> confined to web pages and URLs only. Special characters that need escaping
>> are often a problem.
>>
>> We can't really do anything about that with regard to tag values, they must 
>> be
>> allowed to contain all those characters. But it would help at least a little 
>> if
>> we knew those characters can never appear in tag keys. And I can't really 
>> see a
>> legitimate reason why we need those characters in keys. Looking at the 
>> database
>> almost all cases where they appear in keys are obvious errors. Out of the 
>> about
>> 20000 different keys, there are only about 190 keys with problematic 
>> characters
>> in them (another about 800 with whitespace). Really the only case that I 
>> can't
>> immediately rule out as errors or see an alternative tagging are tag keys 
>> like
>> "maxspeed:weight>7.5". And with those you can already see the problems: Some 
>> of
>> them have ">" instead of the ">".
>>
>> So I'd like us to think about whether we can disallow a few characters from
>> appearing in tag keys. Technically this would mean changing the API to check
>> for those characters, removing any that are already in the database (can be
>> done with normal manual edits because there are so few cases) and adding 
>> checks
>> to the editors so that they can give meaningful error messages. Shouldn't be
>> too hard.
>>
>> So, what characters am I talking about? I haven't drawn up a complete list
>> and we certainly would need to discuss this further.
>>
>> Here is a preliminary list:
>>
>> Whitespace   Should use '_' instead of whitespace in keys, whitespace are
>>               also very confusing for users, especially at beginning and end
>>               of a text.
>>
>> <>&/+?#;%'"  Special characters in XML, HTML and/or URLs.
>>
>> \'"          Characters often used for quoting.
>>
>> =            Because its used in many places as the separation character
>>               between tag key and tag value. If we disallow this, we can 
>> always
>>               treat one string like "foo=bar" as k:foo, v:bar without any
>>               ambiguities.
>>
>> This is a small list of special characters, all other characters should still
>> be allowed. That means tag keys can still be in Chinese or whatever. We'd 
>> just
>> disallow a few characters of which we know that they will make problems again
>> and again.
>>
>> And to emphasize this again: I am only talking about tag keys. Tag values 
>> must
>> be allowed to contain the full Unicode set of characters.
>>
>> Jochen
>
>
> _______________________________________________
> dev mailing list
> dev@openstreetmap.org
> http://lists.openstreetmap.org/listinfo/dev
>

-- 
Jochen Topf  joc...@remote.org  http://www.remote.org/jochen/  +49-721-388298


_______________________________________________
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev

Reply via email to