Parsing as a character-string does not necessarily imply unescaping any 
sequence other than \". So you parse as a character-string, then parse again as 
a comma-separated string, then unescape the components.

Input: "a\,\000,b\,\000\""

Parsed as character-string: a\,\000,b\,\000"

Split on unescaped comma:
    a\,\000
    b\,\000"

Unescaped:
    %x61 %x2c %x00
    %x62 %x2c %x00 %x22

But I am still (or again) confused by this text:

   ALPNs are identified by their registered "Identification Sequence"
   (alpn-id), which is a sequence of 1-255 octets.

   alpn-id = 1*255(OCTET)

   The presentation value of "alpn" is a comma-separated list of one or
   more "alpn-id"s.  Any commas present in the protocol-id are escaped
   by a backslash:

   escaped-octet = %x00-2b / "\," / %x2d-5b / "\\" / %x5D-FF
   escaped-id = 1*(escaped-octet)
   alpn-value = escaped-id *("," escaped-id)


(1) The text mentions "protocol-id" which is a phrase not found anywhere else 
in the text. I think it should probably have said "alpn-id".

(2) The productions above imply that %x00 (null) is a valid character *in the 
presentation format*. I think that has to be a mistake. Do we really want 
literal nulls in the presentation format?

(3) Or perhaps the escaped-octet rule means that these are terminals found 
*after* unescaping sequences like \000?

- lc


> On Jun 26, 2020, at 17:11, Mark Andrews <[email protected]> wrote:
> 
> Except you can’t actually do that.  ‘\044' becomes ‘,' on the first pass if 
> you parse it as a character string first. The ONLY way this works is if you 
> remember which commas are escaped or not (\044 or \, vs ,).  It’s dead easy 
> to split it into alpn-id as you unescape the string.
> 
> Mark
> 
>> On 18 Jun 2020, at 23:53, [email protected] wrote:
>> 
>> OK, I think I now understand the intent, and refactored my code accordingly, 
>> and it is now simpler and cleaner. Yay.
>> 
>> I think it would be clearer to implementers if section 2.1.1 said that all 
>> values are initially parsed as character-strings (allowed to exceed 255 
>> characters), and then further parsed by SvcParamKey-specific parsing which 
>> may, for example, split on comma. I think the current text isn't entirely 
>> clear on the functional separation between generic parsing and key-specific 
>> parsing.
>> 
>> - lc
>> 
>> 
>>> On Jun 15, 2020, at 22:04, Mark Andrews <[email protected]> wrote:
>>> 
>>> 
>>> 
>>>> On 14 Jun 2020, at 05:01, Larry Campbell 
>>>> <[email protected]> wrote:
>>>> 
>>>> I think there's an implementation difficulty. Consider:
>>>> 
>>>> 1.  alpn=h2                ; clear enough
>>>> 2.  alpn="h2"              ; should be equivalent
>>>> 3.  alpn=\h\2              ; should also be equivalent
>>>> 4.  alpn=h2,h3             ; ok (two values)
>>>> 5.  alpn="h2","h3" ; should be equivalent
>>> 
>>> No, as it is key=quoted-string as per 2.1.1 not 
>>> key=quoted-string(,quoted-string\)*
>>> 
>>>> 6.  alpn="h2,h3"   ; malformed? or a single alpn value of h2,h3? or two 
>>>> three-character values, "h2 and h3”?
>>> 
>>> this is correct
>>> 
>>>> 7.  alpn=h2\,h3,h4 ; how should this be parsed?
>>> 
>>> 0x05 0x68 0x32 0xc2 0x68 0x33 0x02 0x68 0x34
>>> 
>>>> Section 2.1.1 tempts one to build the obvious implementation of using 
>>>> one's existing character-string parser, and then passing the parsed 
>>>> character-string to the individual handler for each key type. The alpn and 
>>>> ipv*hint handlers are going to want to split that character-string on 
>>>> comma. That would treat #6 as two two-character values (h2,h3). But #7 is 
>>>> problematic: the generic character-string parser would remove the 
>>>> backslash, and then the alpn handler would treat this as three alpn values 
>>>> when you probably wanted just two
>>> 
>>> When you are also parsing domain names you have to deal with \. being a 
>>> literal period not a domain separator.
>>> exa\.mple.com and “exa\.mple.com” aree being two labels ‘exa.mple’ and 
>>> ‘com’.  This is not really different.
>>> 
>>> That said we do need to address this issue.
>>> 
>>> In BIND we extract quoted-string preserving the escapes (except for ‘\”’) 
>>> then pass the token to a domain name parser or a text parser. Having ‘key=‘ 
>>> preceding the quoted-string is more of a issue and we have to shift modes 
>>> mid-token.
>>> 
>>>> We could make a special character-string parser for alpn and ipv*hint, 
>>>> that handles commas, but it feels odd to have to use a special parser just 
>>>> for certain key types. However, if we must allow commas in alpn names, 
>>>> then we have no choice.
>>> 
>>> You need to reparse value for port, alpn, ipv*hint,
>>> 
>>>> Perhaps it would be clearer to simply remove the three paragraphs of 
>>>> section 2.1.1 beginning with "The presentation for for SvcFieldValue 
>>>> is..." and ending with "...not limited to 255 characters.)". Since the 
>>>> previous paragraph says "Values are in a format specific to the 
>>>> SvcParamKey", perhaps it would be best to leave the description of each 
>>>> value format in the appropriate part of section 6 and for section 2.1.1 to 
>>>> discuss only how to represent and parse unrecognized keys.
>>> 
>>> 
>>>> 
>>>> To keep the implementation simple, the alpn value could be defined as a 
>>>> comma-separated list of sequences of printing ASCII characters, with 
>>>> embedded comma represented as \, backslash as \\, and all nonprinting and 
>>>> non-ASCII characters reprsented as \nnn. (In other words, the full 
>>>> generality of character-string, particularly double-quotes, is not needed 
>>>> here.
>>>> 
>>>> The other comma-separated value types -- ipv4hint and ipv6hint -- do not 
>>>> have this difficulty; they also don't need the full generality of 
>>>> character-string handling, because the individual values can contain only 
>>>> hex digits, periods, and colons, so their specification and implementation 
>>>> can be much simpler.
>>>> 
>>>> And I think section 2.1.1 would be clearer if
>>>> 
>>>>  using decimal escape codes (e.g. \255) when necessary
>>>> 
>>>> were replaced by
>>>> 
>>>>  using decimal escape codes (e.g. \255) for all nonprinting and non-ASCII 
>>>> characters, and using \\ to represent backslash
>>>> 
>>>> - lc
>>>> 
>>>> 
>>>>> On Jun 13, 2020, at 11:25, Ben Schwartz 
>>>>> <[email protected]> wrote:
>>>>> 
>>>>> Larry,
>>>>> 
>>>>> I think that's the intent of the current text, especially the ABNF for 
>>>>> "element".  If you think it's unclear, we should adjust it.  Please 
>>>>> suggest text!
>>>>> 
>>>>> --Ben Schwartz
>>>>> 
>>>>> On Sat, Jun 13, 2020, 10:53 AM Larry Campbell 
>>>>> <[email protected]> wrote:
>>>>> Seciont 6.1 says:
>>>>> 
>>>>>> The presentation value of "alpn" is a comma-separated list of one or 
>>>>>> more "alpn-id"s. Any commas present in the protocol-id are escaped by a 
>>>>>> backslash:
>>>>>> 
>>>>>>  escaped-octet = %x00-2b / "\," / %x2d-5b / "\\" / %x5D-FF
>>>>>>  escaped-id = 1*(escaped-octet)
>>>>>>  alpn-value = escaped-id *("," escaped-id)
>>>>> 
>>>>> If I read this correctly, the presentation value is allowed to contain 
>>>>> nulls and control characters. This seems likely to make such records very 
>>>>> difficult to edit. Wouldn't it be better to require these to be encoded 
>>>>> as \nnn?
>>>>> 
>>>>> - lc
>>>>> 
>>>>> _______________________________________________
>>>>> DNSOP mailing list
>>>>> [email protected]
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ietf.org_mailman_listinfo_dnsop&d=DwIFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=gc4HNe2gylF-6x1tOpS9Zq70q_kVFHKTtJkp1pJY_D4&m=kf9220DuFaSJ-dcBUyvrvUHI9A9wneAvcmzLgZgs8ok&s=xlHdRU6fzrAQDx2lgeob7c2tR-iF311nphkHB_GHcU0&e=
>>>>>  
>>>> 
>>>> _______________________________________________
>>>> DNSOP mailing list
>>>> [email protected]
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ietf.org_mailman_listinfo_dnsop&d=DwIFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=gc4HNe2gylF-6x1tOpS9Zq70q_kVFHKTtJkp1pJY_D4&m=kf9220DuFaSJ-dcBUyvrvUHI9A9wneAvcmzLgZgs8ok&s=xlHdRU6fzrAQDx2lgeob7c2tR-iF311nphkHB_GHcU0&e=
>>>>  
>>> 
>>> -- 
>>> Mark Andrews, ISC
>>> 1 Seymour St., Dundas Valley, NSW 2117, Australia
>>> PHONE: +61 2 9871 4742              INTERNET: [email protected]
> 
> -- 
> Mark Andrews, ISC
> 1 Seymour St., Dundas Valley, NSW 2117, Australia
> PHONE: +61 2 9871 4742              INTERNET: [email protected]
> 

_______________________________________________
DNSOP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to