Hallvard B Furuseth wrote:
Yves Dorfsman writes:
Is there any reason for not amend the LDIF RFC to accept utf-8 chars
without base64 encoding ?
It does allow that.
Really ?
There is one paragraph which is confusing in the RFC
(http://www.ietf.org/rfc/rfc2849.txt):
4) Any dn or rdn that contains characters other than those
defined as "SAFE-UTF8-CHAR", or begins with a character other
than those defined as "SAFE-INIT-UTF8-CHAR", above, MUST be
base-64 encoded. Other values MAY be base-64 encoded. Any
value that contains characters other than those defined as
"SAFE-CHAR", or begins with a character other than those
defined as "SAFE-INIT-CHAR", above, MUST be base-64 encoded.
Other values MAY be base-64 encoded.
But then, SAFE-UT8-CHAR is not defined anywhere, and then:
dn-spec = "dn:" (FILL distinguishedName /
":" FILL base64-distinguishedName)
distinguishedName = SAFE-STRING
; a distinguished name, as defined in [3]
.../...
SAFE-STRING = [SAFE-INIT-CHAR *SAFE-CHAR]
.../...
SAFE-CHAR = %x01-09 / %x0B-0C / %x0E-7F
; any value <= 127 decimal except NUL, LF,
; and CR
And distinguishedName is just one example, all the values are defined as
SAFE-STRING, base64-equivalent.
And sure enough, as I mentioned earlier, most APIs and servers, although
they will accept non-base64 encoded strings when reading, they will always
encode in base64 when there is a non-ascii char.
So what I am thinking about is replace SAFE-CHAR by SAFE-UTF8-CHAR, giving a
proper definition of SAFE-UTF8-CHAR.
Maybe what you are thinking of is that some applications base64-encode
any string which contains 8-bit characters. Either because it would be
more work to decide if a string is valid UTF-8, or because the
application can't know how its output will be used. E.g. if the output
will be printed to a latin-3 terminal, or sent as 7-bit e-mail.
So then, shouldn't the RFC make that clear ? When I researched this, I
remember seeing that one of the LDAP server (either the Oracle or the IBM,
can't remember) added an extension to LDIF, and accepted a "charset:" tag.
Maybe we should add that to the RFC, or these days, just make it utf-8 ?
--
Yves.
http://www.sollers.ca/blog/2008/swappiness
http://www.sollers.ca/blog/2008/swappiness/.fr