Hugo Monteiro <hugo.monte...@fct.unl.pt> writes:

> That's pretty easy to do, but i'm finding the RFC a bit confusing. Could
> you help to clarify EXACTLY what codes/code ranges should be translated?

You do not have to encode

|      UTF1SUBSET     = %x01-27 / %x2B-5B / %x5D-7F
|                          ; UTF1SUBSET excludes 0x00 (NUL), LPAREN,
|                          ; RPAREN, ASTERISK, and ESC.

I would implement it like

--
for (ptr=username; *ptr; ++ptr) {
    char const *HEX_STR = "0123456789abcdef";
    unsigned char c = *ptr;
    
    if (c == '*' || c == '(' || c == ')' || c == '\\' || c >= 0x80) {
         *out++ = '\\';
         *out++ = HEX_STR[c >> 4];
         *out++ = HEX_STR[c & 0x0f];
    } else
         *out++ = c;
}
--

*Valid* UTF-8 sequences do not need to be encoded. But finding out whether
a non 7-bit char is part of a valid UTF-8 sequence is complicated and
because it is allowed to encode all values, I would check for values >=
0x80 only.



Enrico

------------------------------------------------------------------------------
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
_______________________________________________
Dspam-devel mailing list
Dspam-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-devel

Reply via email to