Hallvard B Furuseth wrote:
Yves Dorfsman writes:
Is there any reason for not amend the LDIF RFC to accept utf-8 chars
without base64 encoding ?

It does allow that.

Really ?

There is one paragraph which is confusing in the RFC (http://www.ietf.org/rfc/rfc2849.txt):

      4)  Any dn or rdn that contains characters other than those
          defined as "SAFE-UTF8-CHAR", or begins with a character other
          than those defined as "SAFE-INIT-UTF8-CHAR", above, MUST be
          base-64 encoded.  Other values MAY be base-64 encoded.  Any
          value that contains characters other than those defined as
          "SAFE-CHAR", or begins with a character other than those
          defined as "SAFE-INIT-CHAR", above, MUST be base-64 encoded.
          Other values MAY be base-64 encoded.

But then, SAFE-UT8-CHAR is not defined anywhere, and then:

dn-spec                  = "dn:" (FILL distinguishedName /
                                  ":" FILL base64-distinguishedName)

distinguishedName        = SAFE-STRING
                           ; a distinguished name, as defined in [3]
.../...

SAFE-STRING              = [SAFE-INIT-CHAR *SAFE-CHAR]

.../...

SAFE-CHAR                = %x01-09 / %x0B-0C / %x0E-7F
                           ; any value <= 127 decimal except NUL, LF,
                           ; and CR


And distinguishedName is just one example, all the values are defined as SAFE-STRING, base64-equivalent.

And sure enough, as I mentioned earlier, most APIs and servers, although they will accept non-base64 encoded strings when reading, they will always encode in base64 when there is a non-ascii char.

So what I am thinking about is replace SAFE-CHAR by SAFE-UTF8-CHAR, giving a proper definition of SAFE-UTF8-CHAR.


Maybe what you are thinking of is that some applications base64-encode
any string which contains 8-bit characters.  Either because it would be
more work to decide if a string is valid UTF-8, or because the
application can't know how its output will be used.  E.g. if the output
will be printed to a latin-3 terminal, or sent as 7-bit e-mail.

So then, shouldn't the RFC make that clear ? When I researched this, I remember seeing that one of the LDAP server (either the Oracle or the IBM, can't remember) added an extension to LDIF, and accepted a "charset:" tag. Maybe we should add that to the RFC, or these days, just make it utf-8 ?




--
Yves.
http://www.sollers.ca/blog/2008/swappiness
http://www.sollers.ca/blog/2008/swappiness/.fr


Reply via email to