Dear colleagues, It was pointed out that the ARIN example: whois -h whois.arin.net POC SHRYA12-ARIN
is not correct, and should read: whois -h whois.arin.net "p SHRYA12-ARIN" (I used "POC" instead of "p" and that could either cause "POC" to be additionally returned, or no objects at all, depending on your whois client). Apologies, Ed Shryane RIPE NCC > On 28 May 2024, at 11:27, Edward Shryane <[email protected]> wrote: > > Dear colleagues, > > There was a question about UTF-8 support by major Whois providers during last > week's DB-WG session at RIPE88. > > During the UTF-8 discussion in December I checked the other RIRs as follows: > > LACNIC: only Latin-1 encoded characters are accepted in updates (UTF-8 is > ignored) but UTF-8 is returned on port 43. > Example: whois -h whois.lacnic.net PAP12 > APNIC: only Latin-1 is returned > Example: whois -h testwhois.apnic.net YYYYMMDD-MNT > > Subsequently I tested the other RIRs to be sure: > > ARIN: UTF-8 is supported in the RPSL object and UTF-8 is returned on port 43. > Example: whois -h whois.arin.net POC SHRYA12-ARIN > AFRINIC: UTF-8 characters are accepted in updates and UTF-8 is returned on > port 43. > Example: whois -h whois.afrinic.net SHRYANE-MNT > > RIPE stores Latin-1 and returns Latin-1 on port 43. > > So in summary, 3 RIRs return UTF-8 and 2 RIRs return Latin-1 on port 43. > > Regards > Ed Shryane > RIPE NCC > > > >> On 2 May 2024, at 16:02, Edward Shryane <[email protected]> wrote: >> >> Dear colleagues, >> >> To follow-up on the UTF-8 discusssion in January, the DB team plans to >> implement support for UTF-8 in 3 phases: >> >> (1) Add a flag to allow a client to choose a character set >> >> In the Whois release 1.112, we have added the "-Z / --charset" query flag to >> allow clients to specify which character set they expect. The server >> response will encode RPSL objects using that character set. >> >> This new flag can already be tested in the RC environment, e.g. the >> SHRYANE-MNT object contains "remarks:" attributes with non-ASCII (but still >> latin-1) characters: >> >> $ whois -h whois-rc.ripe.net -r shryane-mnt >> $ whois -h whois-rc.ripe.net -r -Z utf8 shryane-mnt >> >> This flag has no impact on the default behaviour of the RIPE database. This >> change only affects port 43, and the default character set remains latin-1. >> >> This flag will already be useful for example, to capture responses as UTF-8 >> to file or use UTF-8 encoding in your terminal. In future, if the default on >> port 43 changes to UTF-8, then clients can keep latin-1 by using >> "-Z/--charset latin1". >> >> (2) Convert the database schema to UTF-8 >> >> In the following Whois release, the DB team plans to switch the RIPE >> database schema character set from latin-1 to UTF-8. This will allow Whois >> to store UTF-8 strings in the database index tables. >> >> Switching the database schema character set will involve about 1 hour of >> downtime to Whois updates, and Whois queries will not be affected. We will >> announce this change in advance. >> >> This change will have no impact on the default behaviour of the RIPE >> database. All interfaces will behave as before, and RPSL objects will remain >> latin-1 encoded internally. >> >> (3) Allow UTF-8 to be used in RPSL objects >> >> Once the RIPE database schema supports the UTF-8 character set, the DB team >> will create a further Whois release that will allow UTF-8 to be used in RPSL >> objects, in addition to the index tables. >> >> The default behaviour of the RIPE database will remain the same. All >> interfaces will behave as before, but RPSL objects will use UTF-8 internally. >> >> In future, if the DB-WG decides to allow UTF-8 characters in RPSL, the >> database will already support it. >> >> Regards >> Ed Shryane >> RIPE NCC >> >> >>> On 18 Jan 2024, at 10:34, Edward Shryane <[email protected]> wrote: >>> >>> Dear colleagues, >>> >>> Based on the discussion regarding UTF-8 in the RIPE database during the >>> interim meeting yesterday, I suggest that we implement support for UTF-8 in >>> the database (i.e. convert the schema and add a flag to allow a client to >>> choose a character set), but we do not allow additional characters for now, >>> pending further DB-WG discussion. Our intention is to lay the groundwork >>> for future support, without breaking existing functionality. If you have >>> any concerns or objections please let me know. >>> >>> We will now prepare an implementation plan / impact analysis of these >>> changes. >>> >>> Regards >>> Ed Shryane >>> RIPE NCC >>> >>> >>>> On 24 Nov 2023, at 10:03, Edward Shryane via db-wg <[email protected]> wrote: >>>> >>>> Dear colleagues, >>>> >>>> Currently the RIPE database only allows a subset of ASCII characters in >>>> the "org-name:", "person:" and "role:" attributes, for a few reasons >>>> including: >>>> >>>> * These attributes are also a look-up key and the Whois protocol does not >>>> allow specifying character sets in queries. >>>> * RPSL names are ASCII according to RFC2622 >>>> * Using a normalised name makes the object easier to query >>>> * Reading a normalised name is easier to interpret >>>> >>>> However there are some drawbacks to forcing names to only use a subset of >>>> ASCII characters: >>>> >>>> * Organisations, roles and persons cannot use their actual name if it >>>> includes characters outside this subset. >>>> * Normalisation is not standard, but is an interpretation done by each >>>> maintainer, e.g. characters could be excluded or converted in different >>>> ways. >>>> >>>> Since we support the Latin-1 character set in the RIPE database, I propose >>>> we also allow non-ASCII Latin-1 characters in these attributes. >>>> >>>> Querying for a name can be done either using the latin-1 characters >>>> (proposed) or a normalised, ASCII representation (currently). The >>>> normalised version will be generated by Whois and stored in a database >>>> index for querying. The primary key will also be generated from the >>>> normalised version. >>>> >>>> Please let me know your feedback. >>>> >>>> Regards >>>> Ed Shryane >>>> RIPE NCC >>>> >>>> --- >>>> >>>> Whois attribute verbose description (copied from the help text). >>>> >>>> org-name >>>> -------- >>>> Specifies the name of the organisation that this organisation object >>>> represents in the RIPE Database. This is an ASCII-only text attribute. >>>> The restriction is because this attribute is a look-up key and the >>>> whois protocol does not allow specifying character sets in queries. >>>> The user can put the name of the organisation in non-ASCII character >>>> sets in the "descr:" attribute if required. >>>> >>>> A list of 1 to 30 words separated by white space. >>>> A word is made up of ASCII alphanumeric characters and additionally: >>>> ][)(._"*@,&:!'`+/- >>>> A word may have up to 64 characters and is not case sensitive. >>>> Each word can have any combination of the above characters with no >>>> restriction on the start or end of a word. >>>> >>>> person >>>> ------ >>>> Specifies the full name of an administrative, technical or zone >>>> contact person for other objects in the database. >>>> >>>> It should contain 2 to 10 words. >>>> A word is made up of ASCII alphanumeric characters and additionally: .`'_- >>>> The first word should begin with a letter. >>>> At least one other word should also begin with a letter. >>>> Max 64 characters can be used in each word. >>>> >>>> role >>>> ---- >>>> Specifies the full name of a role entity, e.g. RIPE DBM. >>>> >>>> A list of 1 to 30 words separated by white space. >>>> A word is made up of ASCII alphanumeric characters and additionally: >>>> ][)(._"*@,&:!'`+/- >>>> A word may have up to 64 characters and is not case sensitive. >>>> Each word can have any combination of the above characters with no >>>> restriction on the start or end of a word. >>>> >>>> >>>> -- >>>> >>>> To unsubscribe from this mailing list, get a password reminder, or change >>>> your subscription options, please visit: >>>> https://lists.ripe.net/mailman/listinfo/db-wg >>> >> > -- To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/db-wg
