Parsing as a character-string does not necessarily imply unescaping any
sequence other than \". So you parse as a character-string, then parse again as
a comma-separated string, then unescape the components.
Input: "a\,\000,b\,\000\""
Parsed as character-string: a\,\000,b\,\000"
Split on unescaped comma:
a\,\000
b\,\000"
Unescaped:
%x61 %x2c %x00
%x62 %x2c %x00 %x22
But I am still (or again) confused by this text:
ALPNs are identified by their registered "Identification Sequence"
(alpn-id), which is a sequence of 1-255 octets.
alpn-id = 1*255(OCTET)
The presentation value of "alpn" is a comma-separated list of one or
more "alpn-id"s. Any commas present in the protocol-id are escaped
by a backslash:
escaped-octet = %x00-2b / "\," / %x2d-5b / "\\" / %x5D-FF
escaped-id = 1*(escaped-octet)
alpn-value = escaped-id *("," escaped-id)
(1) The text mentions "protocol-id" which is a phrase not found anywhere else
in the text. I think it should probably have said "alpn-id".
(2) The productions above imply that %x00 (null) is a valid character *in the
presentation format*. I think that has to be a mistake. Do we really want
literal nulls in the presentation format?
(3) Or perhaps the escaped-octet rule means that these are terminals found
*after* unescaping sequences like \000?
- lc
> On Jun 26, 2020, at 17:11, Mark Andrews <[email protected]> wrote:
>
> Except you can’t actually do that. ‘\044' becomes ‘,' on the first pass if
> you parse it as a character string first. The ONLY way this works is if you
> remember which commas are escaped or not (\044 or \, vs ,). It’s dead easy
> to split it into alpn-id as you unescape the string.
>
> Mark
>
>> On 18 Jun 2020, at 23:53, [email protected] wrote:
>>
>> OK, I think I now understand the intent, and refactored my code accordingly,
>> and it is now simpler and cleaner. Yay.
>>
>> I think it would be clearer to implementers if section 2.1.1 said that all
>> values are initially parsed as character-strings (allowed to exceed 255
>> characters), and then further parsed by SvcParamKey-specific parsing which
>> may, for example, split on comma. I think the current text isn't entirely
>> clear on the functional separation between generic parsing and key-specific
>> parsing.
>>
>> - lc
>>
>>
>>> On Jun 15, 2020, at 22:04, Mark Andrews <[email protected]> wrote:
>>>
>>>
>>>
>>>> On 14 Jun 2020, at 05:01, Larry Campbell
>>>> <[email protected]> wrote:
>>>>
>>>> I think there's an implementation difficulty. Consider:
>>>>
>>>> 1. alpn=h2 ; clear enough
>>>> 2. alpn="h2" ; should be equivalent
>>>> 3. alpn=\h\2 ; should also be equivalent
>>>> 4. alpn=h2,h3 ; ok (two values)
>>>> 5. alpn="h2","h3" ; should be equivalent
>>>
>>> No, as it is key=quoted-string as per 2.1.1 not
>>> key=quoted-string(,quoted-string\)*
>>>
>>>> 6. alpn="h2,h3" ; malformed? or a single alpn value of h2,h3? or two
>>>> three-character values, "h2 and h3”?
>>>
>>> this is correct
>>>
>>>> 7. alpn=h2\,h3,h4 ; how should this be parsed?
>>>
>>> 0x05 0x68 0x32 0xc2 0x68 0x33 0x02 0x68 0x34
>>>
>>>> Section 2.1.1 tempts one to build the obvious implementation of using
>>>> one's existing character-string parser, and then passing the parsed
>>>> character-string to the individual handler for each key type. The alpn and
>>>> ipv*hint handlers are going to want to split that character-string on
>>>> comma. That would treat #6 as two two-character values (h2,h3). But #7 is
>>>> problematic: the generic character-string parser would remove the
>>>> backslash, and then the alpn handler would treat this as three alpn values
>>>> when you probably wanted just two
>>>
>>> When you are also parsing domain names you have to deal with \. being a
>>> literal period not a domain separator.
>>> exa\.mple.com and “exa\.mple.com” aree being two labels ‘exa.mple’ and
>>> ‘com’. This is not really different.
>>>
>>> That said we do need to address this issue.
>>>
>>> In BIND we extract quoted-string preserving the escapes (except for ‘\”’)
>>> then pass the token to a domain name parser or a text parser. Having ‘key=‘
>>> preceding the quoted-string is more of a issue and we have to shift modes
>>> mid-token.
>>>
>>>> We could make a special character-string parser for alpn and ipv*hint,
>>>> that handles commas, but it feels odd to have to use a special parser just
>>>> for certain key types. However, if we must allow commas in alpn names,
>>>> then we have no choice.
>>>
>>> You need to reparse value for port, alpn, ipv*hint,
>>>
>>>> Perhaps it would be clearer to simply remove the three paragraphs of
>>>> section 2.1.1 beginning with "The presentation for for SvcFieldValue
>>>> is..." and ending with "...not limited to 255 characters.)". Since the
>>>> previous paragraph says "Values are in a format specific to the
>>>> SvcParamKey", perhaps it would be best to leave the description of each
>>>> value format in the appropriate part of section 6 and for section 2.1.1 to
>>>> discuss only how to represent and parse unrecognized keys.
>>>
>>>
>>>>
>>>> To keep the implementation simple, the alpn value could be defined as a
>>>> comma-separated list of sequences of printing ASCII characters, with
>>>> embedded comma represented as \, backslash as \\, and all nonprinting and
>>>> non-ASCII characters reprsented as \nnn. (In other words, the full
>>>> generality of character-string, particularly double-quotes, is not needed
>>>> here.
>>>>
>>>> The other comma-separated value types -- ipv4hint and ipv6hint -- do not
>>>> have this difficulty; they also don't need the full generality of
>>>> character-string handling, because the individual values can contain only
>>>> hex digits, periods, and colons, so their specification and implementation
>>>> can be much simpler.
>>>>
>>>> And I think section 2.1.1 would be clearer if
>>>>
>>>> using decimal escape codes (e.g. \255) when necessary
>>>>
>>>> were replaced by
>>>>
>>>> using decimal escape codes (e.g. \255) for all nonprinting and non-ASCII
>>>> characters, and using \\ to represent backslash
>>>>
>>>> - lc
>>>>
>>>>
>>>>> On Jun 13, 2020, at 11:25, Ben Schwartz
>>>>> <[email protected]> wrote:
>>>>>
>>>>> Larry,
>>>>>
>>>>> I think that's the intent of the current text, especially the ABNF for
>>>>> "element". If you think it's unclear, we should adjust it. Please
>>>>> suggest text!
>>>>>
>>>>> --Ben Schwartz
>>>>>
>>>>> On Sat, Jun 13, 2020, 10:53 AM Larry Campbell
>>>>> <[email protected]> wrote:
>>>>> Seciont 6.1 says:
>>>>>
>>>>>> The presentation value of "alpn" is a comma-separated list of one or
>>>>>> more "alpn-id"s. Any commas present in the protocol-id are escaped by a
>>>>>> backslash:
>>>>>>
>>>>>> escaped-octet = %x00-2b / "\," / %x2d-5b / "\\" / %x5D-FF
>>>>>> escaped-id = 1*(escaped-octet)
>>>>>> alpn-value = escaped-id *("," escaped-id)
>>>>>
>>>>> If I read this correctly, the presentation value is allowed to contain
>>>>> nulls and control characters. This seems likely to make such records very
>>>>> difficult to edit. Wouldn't it be better to require these to be encoded
>>>>> as \nnn?
>>>>>
>>>>> - lc
>>>>>
>>>>> _______________________________________________
>>>>> DNSOP mailing list
>>>>> [email protected]
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ietf.org_mailman_listinfo_dnsop&d=DwIFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=gc4HNe2gylF-6x1tOpS9Zq70q_kVFHKTtJkp1pJY_D4&m=kf9220DuFaSJ-dcBUyvrvUHI9A9wneAvcmzLgZgs8ok&s=xlHdRU6fzrAQDx2lgeob7c2tR-iF311nphkHB_GHcU0&e=
>>>>>
>>>>
>>>> _______________________________________________
>>>> DNSOP mailing list
>>>> [email protected]
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ietf.org_mailman_listinfo_dnsop&d=DwIFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=gc4HNe2gylF-6x1tOpS9Zq70q_kVFHKTtJkp1pJY_D4&m=kf9220DuFaSJ-dcBUyvrvUHI9A9wneAvcmzLgZgs8ok&s=xlHdRU6fzrAQDx2lgeob7c2tR-iF311nphkHB_GHcU0&e=
>>>>
>>>
>>> --
>>> Mark Andrews, ISC
>>> 1 Seymour St., Dundas Valley, NSW 2117, Australia
>>> PHONE: +61 2 9871 4742 INTERNET: [email protected]
>
> --
> Mark Andrews, ISC
> 1 Seymour St., Dundas Valley, NSW 2117, Australia
> PHONE: +61 2 9871 4742 INTERNET: [email protected]
>
_______________________________________________
DNSOP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/dnsop