On 22/11/2025 03:59, Andrew Sullivan wrote:
On Sat, Nov 22, 2025 at 02:31:41AM -0500, Petr Menšík wrote:
Yes, it is binary data. Any binary content is permitted. It only
depends in what ways you choose to display it.
Then the requirement about UTF-8 is vacuous and should be removed from
the document. The problem with that approach, of course, is that the
interoperability argument for standardizing this at all is rather
harmed. But since there's a semantics to the tags in a TXT RR
implementing the specification, it's sort of hard to believe it's
really just a matter of whatever the display wants to do.
I would not say it a requirement. I am not sure what document are you
referring to. This is an idea without a formal draft. fval of
draft-davids-forsalereg is a good example. "v=FORSALE1;fval=€999" is
good for people and should be printed in a human friendly way, IMO.
How we got from binary only data to automatic processing,
normalization form of some kind? If it contains only printable
characters encoded in verified encoding, it is reasonably safe to not
escape each byte.
To the extent I understand this, I'm pretty sure I disagree. I don't
think this is the right list for elementary education about text
encoding; but you seem already to have "only printable characters" in
your assumptions there, so perhaps I'll ask you this: Are ZWJ and ZWNJ
printable or not? If you don't understand the question, don't know
what the answer is, or don't know why that is a trick question then I
suggest you can't wave away this set of problems. If instead you want
to say that literally _any_ binary data is allowed, then say that. If
you want to suggest something other than some PRECIS profile (I think
it was in this thread that Paul Hoffman made a different suggestion),
then do that. But the IETF has made a hash of internationalization
over and over again precisely because of specification writers waving
away the problems inherent in writing systems and their encodings
online. Perhaps not ironically, one of the earliest examples of this
is the DNS itself, which is "8 bit clean" except, of course, for that
little part where some bits match other bits in some cases (this is
case folding). The intermediate systems are supposed to cache the
original form, but that doesn't always happen.
Python3 says ZWJ is not printable. These definitions are corner cases.
It is up to higher layers to render text. That is job of GUI toolkits,
terminal emulators and similar.
"\u200C".isprintable() == False
I think in native code iswprint() should be used to guess, whether to
escape or not. When the first codepoint is non-printable, then the
remaining can be escaped. It does not seem to be text for humans. Of
course isprint() must not be used on raw undecoded UTF-8 bytes to have
sensible results.
PRECIS or RFC 9839 is an implementation detail. As long as they both
result in "háčkyčárky".isprintable() == True, any of them is fine. I
think for printing the text RFC 9839 is sufficient and simpler form, I
would recommend to have used in DNS software. They do not need to know
what is uppercase letter or a digit in arabic. They need to escape only
any Problematic Code Points specified and print the rest of utf-8
encoded text as a normal text. If a domain name can contain socks icon,
why not a TXT record? Why are no emoticons rendered?
record, or only in the ftxt subtype. For instance, is the host part
of an furi entry required to be an ASCII string (i.e. if it's an
IDN, must it be the A-label form?) or may it include UTF-8 strings
beyond the ASCII-equivalent range? It seems to me it would be
valuable to specify which is meant.
Domain labels are out of scope, IDN is unrelated.
I find that a little hard to swallow given that the content of a furi
entry is a URI.
URI can have normalized form of only A-label input with percent encoded
path characters in URI. It can be presented to user in U-label form
without percent escaping in paths. In a nice way.
https://háčkyčárky.cz/ can be sent on-wire as
https://xn--hkyrky-ptac70bc.cz/ without losing any information. Remote
party can display it in form nice to humans. Most DNS tools do not
present TXT records in similar form to users.
They have to be compared case-insensitive, which require to decode
each code point and locate proper lowercase/uppercase letter matching
the source.
Great. What is the uppercase match of â? Is it different to the
uppercase match of ä? All the time in every language? (a hint:
IDNA2008 solves this problem for you, so there's never a case where
the answer to this is ambiguous for an a-label/u-label pair.)
Can you specify where exactly do I need that information? I want only
printed TXT records with text not in english. With an exception of
Multicast DNS label, where IDNA 2008 is not used, but utf-8 directly is.
The only question is whether I need to do escaping or not when
presenting response.
Content of records remain application specific. If I look on
google.com TXT response, I do not see any escaped data. Even if it
contains also binary contents in some base64 encoding.
Sure, but we weren't talking about any TXT record. I was talking
about draft-davids-forsalereg, not google.com TXT records, so I
don't understand how they are relevant.
Best regards,
A
Best Regards,
Petr
--
Petr Menšík
Senior Software Engineer, RHEL
Red Hat, https://www.redhat.com/
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
_______________________________________________
DNSOP mailing list -- [email protected]
To unsubscribe send an email to [email protected]