> From: Kostik > Yes, I'm talking about 8-bit encoded message:
> Content-Type: text/html;charset=koi8-r > http://??????.?? > --- > In the real world such messages are exist. Is it possible to somehow encode > such domains in Punycode and only then use DNSBL? After I wrote a bunch of code and started to test it, I realized this is an even bigger mess than I thought. The trouble is in the practically infinite number of Content-Type charsets. Punycode is a scheme to encode Unicde domain names in the RFC 1034 subset of ASCII. How can dccproc/dccm/dccifd convert the many Cyrillic and other non-ASCII character sets to Unicode? > Now this situation in the logs looks like this: > --- > DNSBL helper URL \208\210\201\215\197\212.\210\198 > gethostbyname(\208\210\201\215\197\212.\210\198.dbl.spamhaus.org): Unknown > host\n That a good example of the hopelessness of the situation, because it is in koi8-r. Unless I add code to recognized "charset=koi8-r", koi8-u, cp866, Windows-1251, and perhaps 8859-5, dccm/dccifd/dccproc cannot know how to convert that domain name to Punycode. And that's only for Cyrllic. The best I could do is notice when a domain label looks like UTF-8, and convert it to Unicode and then to Punycode. Vernon Schryver [email protected] _______________________________________________ DCC mailing list [email protected] http://www.rhyolite.com/mailman/listinfo/dcc
