On 7 Apr 2024 18:23 +0200, from anton.sharo...@gmail.com (Anton Sharonov): >>> For example Message-Id: <43265...@example.com> consisting of random >>> digits and domain name >> >> There's a good reason for that; it help to ensure uniqueness, which prevents >> problems with threading. By limiting itself to only digits, Yandex's IDs are >> much more likely to collide. > > Hm, not sure about that... Given that string is long enough, > random string which consists only out of digits can perhaps > compete with (much shorter) random string of alpha-numeric > characters - in terms of uniqueness probability?
Sure; assuming a Base64 charset, each character encodes log_2(64) = 6 bits; for decimal digits, each character encodes log_2(10) ~ 3.3 bits. Eight decimal digits encode only about 26 bits; eight Base64 characters encode 48 bits. (To encode 48 bits using only decimal digits you need at least 15 digits.) So the decimal digits case has a very much non-trivial chance of a random collision within a fixed domain name in 8 characters, whereas the Base64 case is significantly less likely to have collisions within the same number of characters; in both cases assuming that the part before the @ is assigned at random. Look up the "birthday paradox". The value of the Message-ID header is required by RFC 5322 to be unique. https://www.rfc-editor.org/rfc/rfc5322.html#section-3.6.4 That has been the case since RFC 822 https://www.rfc-editor.org/rfc/rfc822 (section 4.6.1, page 23). RFC 5322 puts the uniqueness of the message ID as a "MUST" level requirement, even going so far as to state that _twice_ within two consecutive sentences. _How you guarantee that uniqueness_ is up to you, but not following that requirement _is_ going to cause a variety of issues, quite possibly up to and including messages not being delivered properly. Violate that requirement at your own peril. -- Michael Kjörling 🔗 https://michael.kjorling.se “Remember when, on the Internet, nobody cared that you were a dog?”