Re: Naming and discoverability of fetches and converters

Tim Düsterhus Wed, 23 Sep 2020 14:04:14 -0700

Willy,

please excuse my belated reply. First the weekend was there and then
subsequently I forgot about your email. Unfortunately no one else wanted
to add their 2ct so far :-( I believe this is a topic that could benefit
from administrator opinions.


Am 18.09.20 um 15:26 schrieb Willy Tarreau:
>> Then there's quite a few redundant converters. We have `sha1`, then
>> `sha2` and most recently we have `digest`.
> 
> From what I'm reading in the commit message (the doc not being clear
> on this point), they don't seem to overlap as "digest" seems to return
> an HMAC. In addition, "sha1" doesn't require openssl so it comes for
> free. But anyway I get your point and generally agree.

No, digest() is a plain hash. hmac() is for HMACs :-) Both were added in
the same commit, though.

>> Then within the description of the converters the variable naming and
>> scoping is explained over and over again for each converter that takes
>> variable names.
> 
> Did you also notice that copy-pasting is quite a common practice in the
> doc :-/

Once there's precedent you don't really feel confident rewriting the
entire docs just to add a converter and simply copy over the exact same
paragraph yet again.

>> But there's also
>> a difference in naming. Sometimes the namespace is separated by a dot
>> (`req.hdr`). Sometime it's an underscore (`ssl_fc`).
> 
> Yep. The dot was added later to try to group all those working within
> the same domain (e.g. the request or the response). We also (re)discovered
> that the automatic binding making them directly available from Lua is less
> transparent with the dot as that one has to be turned to an underscore
> there. But it's less common and the dot still appears much more readable
> to me.

Yes, I agree. Maybe it's because the dot feels natural from the member
access of a struct in C (or as a property access in many other
programming languages). It feels like a natural namespace separator to me.

>> Long story short: Is it feasible to update the canonical names of
>> fetches and converters, preserving backwards compatibility with old
>> versions and adding the missing "duals" for encoders / decoders or do we
>> have to live with the current situation.
> 
> We can *try*. Trust me, it is *extremely hard* to correctly name things.
> A significant part of the development job is to find correct names for
> variables, functions, struct members, and worse, config options! You want
> non-confusing names that accurately represent what you're designating with
> a very limited risk that it later degenerates into something broader and
> confusing again.
> 
>> -------
>>
>> Let me give a proposal of what makes sense to me:
>>
>> Make 's.' contain string related converters.
>>
>> s.field() instead of field()
>> s.word() instead of word()
>> s.json() instead of json()
> 
> Maybe, why not for these ones. But then json() alone cannot stay
> this way and needs to be called json-enc() or something like this
> since it only encodes and doesn't decode.

Technically even json_enc() is not entirely appropriate, because the
only thing it does is encoding a value so that the result is a valid
contents of a string in JSON. Specifically bool(1),json() is simply `1`,
not `true`. And str("foo"),json() is `foo`, not `"foo"`.

>> Make 'http.' contain HTTP related converters.
>>
>> http.language() instead of language()
> 
> This already becomes difficult. http is very wide and touches all
> areas. For example you'll use http.date() to encode an HTTP date
> while plenty of users will search for it in "date.something" because
> it's just one way to represent a date, or "time.something" because a
> date is first and foremost something related to time.

I guess for that date() would best be generalized to take the proper
format as a parameter. Allowing e.g. date(iso8601), initially only
supporting dates for use in HTTP header.

>> Make 'math.' contain mathematics related converters.
>>
>> math.add() instead of add()
>> math.sub() instead of sub()
> 
> And here I'd say that it will turn a number of expressions to
> something very hard to read. We purposely tried to keep such
> operators quite compact to keep expressions readable. I'd argue
> that ",mul(3),add(21),div(32)" is quite more readable than
> ",math.mul(3),math.add(21),math.div(32)". I'm not fundamentally
> opposed, but I'm pretty sure that because of this nobody will ever
> use the long form, and we'll have added more noise in the doc
> instead. 

I must admit that I never used the math operators for anything before.
No idea how commonly they are used in real world configurations. A
prefix certainly would make them unwieldy for longer expressions.

>> Make 'sec.' contain security related converters.
>>
>> sec.digest() instead of digest()
>> deprecate sha1, sha2 in favor of digest()
> 
> Actually I'd rather have "hash.something" for everything related to
> hashing, as most of the time it has nothing to do with security, and
> making people think that "djb2" or "crc32" are secure is a problem for
> me, and personnaly I don't see a reason why these ones wouldn't appear
> next to "sha1" or "digest".

Yes, that makes sense to me.

>> sec.secure_memcmp() instead of secure_memcmp()
> 
> Maybe "sec.memcmp()" or s.secure_memcmp() since it's about strings ?

Here we are starting bikeshedding. Personally I would be happy with any
improvement and I just attempted to give a few examples. Someone
definitely needs to draft up a consistent and complete set to allow
giving proper feedback. Discussing about single converters is not
helping, I guess.

>> The biggest benefit to namespacing for me is that related converters
>> appear below each other within configuration.txt. The most commonly
>> required converters could still remain in the default namespace (or
>> perhaps rather aliased into the default namespace). I don't believe that
>> 's.' for the string stuff is too bad, though.
> 
> Now I'm wondering if we shouldn't consider a variant of this proposal.
> Apparently the main issue is with finding them, so they should be better
> classified. We could imagine having them sorted by use case in the doc,
> even with redundancy, and for each of them, suggesting a long and a short
> name (this one optional). We could then consider that certain name first
> and foremost belong to one category (often related to what file it's
> implemented in), and inherits the long name from this category, but also
> has a short name. The keyword wouldn't be decribed in details there, it
> would just be an index, allowing the same one to appear at multiple
> places. You could then have:
> 
>  Security-oriented converters
>    long name          | short name  | purpose
>    -------------------+-------------+---------------------------------------
>    crypto.aes_gcm_enc | aes_gcm_enc | encode binary input using AES_GCM
>    crypto.aes_gcm_dec | aes_gcm_dec | decode binary input using AES_GCM
>    crypto.sha2        | sha2        | hashes binary input using SHA2
> 
> ...
>  Encoding functions
>    long name          | short name  | purpose
>    -------------------+-------------+---------------------------------------
>    crypto.aes_gcm_enc | aes_gcm_enc | encode binary input using AES_GCM
>    crypto.aes_gcm_dec | aes_gcm_dec | decode binary input using AES_GCM
>    s.tolower          | lower       | lower case input string
>    s.toupper          | upper       | lower case input string
>    s.b64enc           | base64      | encode using base64
>    s.b64dec           | b64dec      | decode using base64
>    http.url_enc       | url_enc     | URL-encode binary input
> ...
>  Hashing function
>    long name          | short name  | purpose
>    -------------------+-------------+---------------------------------------
>    hash.crc32         | crc32       | hashes binary input using crc32 algo
>    hash.djb2          | djb2        | hashes binary input using djb2 algo
>    hash.wt6           | wt6         | hashes binary input using wt6 algo
>    hash.sha1          | sha1        | hashes binary input using SHA1
>    crypto.sha2        | sha2        | hashes binary input using SHA2
>    crypto.digest      | digest      | hashes binary input using configurable 
> algos
> ...
> 
> So there will definitely be some redundancy, which translates the reality,
> and the keywords will only be indexed using their long names. As such they
> will be much easier to find depending on what a user is trying to do, it
> could easily be said at one place that all the crypto.xxx stuff is only
> available when openssl is enabled, and overall we could simplify the search
> there.
> 
> What do you think ?
> 

Yes, such a categorized list would certainly solve most of the current
pain points of the converter documentation. I probably would even leave
out the purpose column and instead relying on a descriptive name + the
current long form description. That should make maintaining the
documentation a bit easier. Most the the purposes you listed there are
painfully tautological. I mean ... it's obvious that hash.crc32 will
create a CRC-32 hash.

I believe that the short names of the more uncommon converters should be
deprecated in the long run, though. This keeps the list nice and tidy
and makes the configuration more self-documenting. Especially since the
current short names are not ideally named as we established in this
conversations.

Best regards
Tim Düsterhus

Re: Naming and discoverability of fetches and converters

Reply via email to