On Wed, Sep 23, 2020 at 11:39:57PM +0200, Tim Düsterhus wrote:
> Willy,
>
> Am 23.09.20 um 23:02 schrieb Tim Düsterhus:
> > Yes, such a categorized list would certainly solve most of the current
> > pain points of the converter documentation. I probably would even leave
> > out the purpose column and instead relying on a descriptive name + the
> > current long form description. That should make maintaining the
> > documentation a bit easier. Most the the purposes you listed there are
> > painfully tautological. I mean ... it's obvious that hash.crc32 will
> > create a CRC-32 hash.
> >
> > I believe that the short names of the more uncommon converters should be
> > deprecated in the long run, though. This keeps the list nice and tidy
> > and makes the configuration more self-documenting. Especially since the
> > current short names are not ideally named as we established in this
> > conversations.
> >
>
> Splitting this off in a separate email to allow discussing this
> separately. I feel it might be helpful having a more organized process
> for documenting those converters that would allow to automatically
> generate those category lists without requiring a human to add the
> converter in all the appropriate locations, while also taking care of
> the alphabetic order.
>
> As much as I dislike YAML, it feels like an appropriate format here,
> because it allows to mix structured data with arbitrary text. Each
> converter would then be placed into a dedicated YAML file that contains
> everything one needs to know about that converter.
>
> Here's a simply example script in Python:
>
>
> > import sys
> > import textwrap
> > from ruamel.yaml import YAML
> >
> > yaml = YAML()
> > docs = yaml.load_all("""
> > ---
> > signature: sha2(<bits>)
> > tags:
> > - hash
> > - crypto
> > requirements:
> > - USE_OPENSSL
> > --- |
> > Converts a binary input sample to a digest in the SHA-2 family. The result
> > is a binary sample with length of <bits>/8 bytes.
> >
> > Valid values for <bits> are 224, 256, 384, 512, each corresponding to
> > SHA-<bits>. The default value is 256.
> > """)
This is exactly the type of thing I want to avoid. Not only it's unreadable,
in addition it's unwritable, and would make the contribution process a pain
for anyone who doesn't know this part very well. Many of the sample fetches
and converters are added by first-time contributors who currently do a good
job at this and don't face major difficulties nor need to learn any particular
language. In your example above I have no idea how I can enter formatted text
with an example, or a multi-line entry or whatever, and *this* is what can
easily discourage anyone. Right now the main benefit of what we have is that
if it reads fine for you it's OK by definition. Sure it encourages blind
copy-paste but it's no big deal, and it's natural that once in a while we
have to refactor things once they start to accumulate. Another common issue
is that some keywords are not properly sorted, but that's to be expected
since not everyone uses a latin alphabet and perfectly knows the ordering
(and even those using it sometimes get it wrong). Should we care about this?
Clearly no.
Let's keep the contribution a low entry barrier, and have some contributors
like you and me for whom it's easier to occasionally provide fixes do the
necessary janitor work, and that's fine. It's the same with Ilya's typo
fixes. When I spot one in a patch I'm about to merge, I fix it before
merging it, otherwise it will be part of the next typo fixes series and
it's not dramatic.
Now, if you think that some scripts can make it easier for some users to
enter some doc, maybe we can place them into scripts/. A script like yours
could take a few arguments like keyword name, arguments, type, description,
see also, and emit the block of formatted text that can be copy-pasted and
serve as a skeletton if the user wants to improve it before committing.
Willy