Dear colleagues,

It was pointed out that the ARIN example: whois -h whois.arin.net POC 
SHRYA12-ARIN

is not correct, and should read: whois -h whois.arin.net "p SHRYA12-ARIN"

(I used "POC" instead of "p" and that could either cause "POC" to be 
additionally returned, or no objects at all, depending on your whois client).

Apologies,

Ed Shryane
RIPE NCC


> On 28 May 2024, at 11:27, Edward Shryane <[email protected]> wrote:
> 
> Dear colleagues,
> 
> There was a question about UTF-8 support by major Whois providers during last 
> week's DB-WG session at RIPE88.
> 
> During the UTF-8 discussion in December I checked the other RIRs as follows:
> 
> LACNIC: only Latin-1 encoded characters are accepted in updates (UTF-8 is 
> ignored) but UTF-8 is returned on port 43.
> Example: whois -h whois.lacnic.net PAP12
> APNIC: only Latin-1 is returned
> Example: whois -h testwhois.apnic.net YYYYMMDD-MNT
> 
> Subsequently I tested the other RIRs to be sure:
> 
> ARIN: UTF-8 is supported in the RPSL object and UTF-8 is returned on port 43.
> Example: whois -h whois.arin.net POC SHRYA12-ARIN
> AFRINIC: UTF-8 characters are accepted in updates and UTF-8 is returned on 
> port 43.
> Example: whois -h whois.afrinic.net SHRYANE-MNT
> 
> RIPE stores Latin-1 and returns Latin-1 on port 43.
> 
> So in summary, 3 RIRs return UTF-8 and 2 RIRs return Latin-1 on port 43.
> 
> Regards
> Ed Shryane
> RIPE NCC
> 
> 
> 
>> On 2 May 2024, at 16:02, Edward Shryane <[email protected]> wrote:
>> 
>> Dear colleagues,
>> 
>> To follow-up on the UTF-8 discusssion in January, the DB team plans to 
>> implement support for UTF-8 in 3 phases:
>> 
>> (1) Add a flag to allow a client to choose a character set
>> 
>> In the Whois release 1.112, we have added the "-Z / --charset" query flag to 
>> allow clients to specify which character set they expect. The server 
>> response will encode RPSL objects using that character set.
>> 
>> This new flag can already be tested in the RC environment, e.g. the 
>> SHRYANE-MNT object contains "remarks:" attributes with non-ASCII (but still 
>> latin-1) characters:
>> 
>>   $ whois -h whois-rc.ripe.net -r shryane-mnt
>>   $ whois -h whois-rc.ripe.net -r -Z utf8 shryane-mnt
>> 
>> This flag has no impact on the default behaviour of the RIPE database. This 
>> change only affects port 43, and the default character set remains latin-1.
>> 
>> This flag will already be useful for example, to capture responses as UTF-8 
>> to file or use UTF-8 encoding in your terminal. In future, if the default on 
>> port 43 changes to UTF-8, then clients can keep latin-1 by using 
>> "-Z/--charset latin1".
>> 
>> (2) Convert the database schema to UTF-8
>> 
>> In the following Whois release, the DB team plans to switch the RIPE 
>> database schema character set from latin-1 to UTF-8. This will allow Whois 
>> to store UTF-8 strings in the database index tables.
>> 
>> Switching the database schema character set will involve about 1 hour of 
>> downtime to Whois updates, and Whois queries will not be affected. We will 
>> announce this change in advance.
>> 
>> This change will have no impact on the default behaviour of the RIPE 
>> database. All interfaces will behave as before, and RPSL objects will remain 
>> latin-1 encoded internally.
>> 
>> (3) Allow UTF-8 to be used in RPSL objects
>> 
>> Once the RIPE database schema supports the UTF-8 character set, the DB team 
>> will create a further Whois release that will allow UTF-8 to be used in RPSL 
>> objects, in addition to the index tables.
>> 
>> The default behaviour of the RIPE database will remain the same. All 
>> interfaces will behave as before, but RPSL objects will use UTF-8 internally.
>> 
>> In future, if the DB-WG decides to allow UTF-8 characters in RPSL, the 
>> database will already support it.
>> 
>> Regards
>> Ed Shryane
>> RIPE NCC
>> 
>> 
>>> On 18 Jan 2024, at 10:34, Edward Shryane <[email protected]> wrote:
>>> 
>>> Dear colleagues,
>>> 
>>> Based on the discussion regarding UTF-8 in the RIPE database during the 
>>> interim meeting yesterday, I suggest that we implement support for UTF-8 in 
>>> the database (i.e. convert the schema and add a flag to allow a client to 
>>> choose a character set), but we do not allow additional characters for now, 
>>> pending further DB-WG discussion. Our intention is to lay the groundwork 
>>> for future support, without breaking existing functionality. If you have 
>>> any concerns or objections please let me know.
>>> 
>>> We will now prepare an implementation plan / impact analysis of these 
>>> changes.
>>> 
>>> Regards
>>> Ed Shryane
>>> RIPE NCC
>>> 
>>> 
>>>> On 24 Nov 2023, at 10:03, Edward Shryane via db-wg <[email protected]> wrote:
>>>> 
>>>> Dear colleagues,
>>>> 
>>>> Currently the RIPE database only allows a subset of ASCII characters in 
>>>> the "org-name:", "person:" and "role:" attributes, for a few reasons 
>>>> including:
>>>> 
>>>> * These attributes are also a look-up key and the Whois protocol does not 
>>>> allow specifying character sets in queries.
>>>> * RPSL names are ASCII according to RFC2622
>>>> * Using a normalised name makes the object easier to query
>>>> * Reading a normalised name is easier to interpret
>>>> 
>>>> However there are some drawbacks to forcing names to only use a subset of 
>>>> ASCII characters:
>>>> 
>>>> * Organisations, roles and persons cannot use their actual name if it 
>>>> includes characters outside this subset.
>>>> * Normalisation is not standard, but is an interpretation done by each 
>>>> maintainer, e.g. characters could be excluded or converted in different 
>>>> ways.
>>>> 
>>>> Since we support the Latin-1 character set in the RIPE database, I propose 
>>>> we also allow non-ASCII Latin-1 characters in these attributes.
>>>> 
>>>> Querying for a name can be done either using the latin-1 characters 
>>>> (proposed) or a normalised, ASCII representation (currently). The 
>>>> normalised version will be generated by Whois and stored in a database 
>>>> index for querying. The primary key will also be generated from the 
>>>> normalised version.
>>>> 
>>>> Please let me know your feedback.
>>>> 
>>>> Regards
>>>> Ed Shryane
>>>> RIPE NCC
>>>> 
>>>> ---
>>>> 
>>>> Whois attribute verbose description (copied from the help text).
>>>> 
>>>> org-name
>>>> --------
>>>> Specifies the name of the organisation that this organisation object
>>>> represents in the RIPE Database. This is an ASCII-only text attribute.
>>>> The restriction is because this attribute is a look-up key and the
>>>> whois protocol does not allow specifying character sets in queries.
>>>> The user can put the name of the organisation in non-ASCII character
>>>> sets in the "descr:" attribute if required.
>>>> 
>>>> A list of 1 to 30 words separated by white space. 
>>>> A word is made up of ASCII alphanumeric characters and additionally: 
>>>> ][)(._"*@,&:!'`+/-
>>>> A word may have up to 64 characters and is not case sensitive. 
>>>> Each word can have any combination of the above characters with no 
>>>> restriction on the start or end of a word.
>>>> 
>>>> person
>>>> ------
>>>> Specifies the full name of an administrative, technical or zone
>>>> contact person for other objects in the database.
>>>> 
>>>> It should contain 2 to 10 words.
>>>> A word is made up of ASCII alphanumeric characters and additionally: .`'_-
>>>> The first word should begin with a letter.
>>>> At least one other word should also begin with a letter.
>>>> Max 64 characters can be used in each word.
>>>> 
>>>> role
>>>> ----
>>>> Specifies the full name of a role entity, e.g. RIPE DBM.
>>>> 
>>>> A list of 1 to 30 words separated by white space.
>>>> A word is made up of ASCII alphanumeric characters and additionally: 
>>>> ][)(._"*@,&:!'`+/-
>>>> A word may have up to 64 characters and is not case sensitive. 
>>>> Each word can have any combination of the above characters with no 
>>>> restriction on the start or end of a word.
>>>> 
>>>> 
>>>> -- 
>>>> 
>>>> To unsubscribe from this mailing list, get a password reminder, or change 
>>>> your subscription options, please visit: 
>>>> https://lists.ripe.net/mailman/listinfo/db-wg
>>> 
>> 
> 


-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/db-wg

Reply via email to