Andreas Tille:
Hi Niels,
at first sorry for my late answer.
At Thu, May 09, 2024 Niels Thykier wrote:
[...] >> For me, lintian fails in all roles it has. It is not a good tool for
newbies
to get help, since it can only test build artifacts. As an example, your
feedback look is a full package build followed by unpacking the package just
so lintian can tell you have a typo on line 4. That is a massive waste of
resources - notably developer time and mental bandwidth.
I understand your point about having a tool that checks the debian/ dir for
issues like spelling errors, binary files in the upstream source, and other
concerns right within the packaging tree before the build starts. However, I
don't understand why you mention newbies in this context.
My core argument is the feedback cycle is excruciatingly complicated and
slow compared to what it needs to be for validation of "debian/*" files.
In my view, the problem is amplified for newcomers in multiple areas.
[...]
As a consequence,
people now get auto-rejects when uploading because lintian on the FTP master
server does not produce the same output as current lintian in stable or
newer.
I think its a bit unfair to blame lintian about the fact that its old
versions do not do a proper job when it comes to checking newer packages.
Is it now? When I maintained lintian, I was of the understanding that
the dak usage was an explicit use-case we, as lintian maintainers, were
expected to support. In my time, I would have considered this situation
as an RC bug against lintian if I had this change and the FTP masters
were unable or unwilling to install the -backports version of lintian.
On the other side of the "unfairness" coin, I feel it is unfair to have
people spend volunteer time being stuck in a painful cycle of "It works
on my machine, but dak rejects it because lintian is not updated on the
FTP masters machine" for which they are expected to ignore lintian
warnings locally to get out of (you need overrides in the old format,
which the new lintian then complains about - damned if you, damned if
you don't). Those are volunteers that wasted their Debian time being
cauhgt between lintian and dak and, in my book, that was much more
unfair than having lintian (or dak) and its maintainers own up to it.
I feel we, as a distribution, should ensure such problems do not happen.
As stated, in my time as a lintian maintainer, I felt the responsibility
was with lintian and that is why I blame lintian.
Maybe times have changed here and we, as a distribution, no longer hold
lintian accountable here. Not sure who is then, but maybe that is part
of why this problem has existed for so long.
(For the record, I think the ship sailed on this one. I am not expecting
Alex to go retroactively fix this problem on the lintian side. I expect
us not to repeat this mistake again)
[...]
Especially for the editor support
related parts, where people get instant feedback both on issues and the fix,
automatic reformatting on save and completion suggestions. None of which
lintian or wrap-and-sort are capable of.
If you ask me personally I'm absolutely happy about a policy checker that
simply reports issues. I'm fine with firing up an editor in some other
terminal and be done. Maybe I'm missing your point but for me that's a
non-issue. Or is your comparison with wrap-and-sort rather targeting at
some tool that automatically fixes the issues it has found and I can check
the changes afterwards with `git diff`? Or something like the janitor tools
that even commit changes?
I feel my point is not coming across at all and that is frustrating me a
bit.
Imagine you need to change `debian/control` for some reason regardless
of the situation that triggered this. You open up your editor and do the
change. In the process, you make a mistake.
The current workflow is:
1) Edit file (introducing mistake)
2) No feedback in the editor, so:
a) You save the file
b) Build an artifact that lintian can check
c) Run lintian to get the feedback
3) You correct the mistake.
4) Rinse and repeat all the sub-steps of 2) to validate there are no
mistakes.
This is the workflow you have today with lintian. And it applies equally
to all kinds of mistakes from policy violations, to textual or semantic
typos.
Now, I would like you to step away from the status quo. What this
workflow *should* have been in my view is:
1) Edit file (introducing mistake)
2) Editor shows a "Here is a mistake"-marker.
3) You correct the mistake (either manually or via a quick fix)
4) Editor removes a "Here is a mistake"-marker.
5) Save the file
Notice here that I do not need to leave my editor to get feedback. I get
it automatically, so I cannot forget it nor am I inclined to skip the
check in a hurry. This is the crux of my problem with status-quo
feedback loop. I have *actively* ask for feedback. I have to wait for it
too which becomes paper cut.
These are unnecessary a mental burden and paper cuts for a
considerable part of problems you can introduce via editing `debian/*`
files. IDEs have solved this problem very well via their near instant
feedback loops. I feel we are long overdue for that.
Similarly, when you consider the reformatting flow of today, the flow is:
1) Edit file
2) Save file.
3) Run `wrap-and-sort` to reset formatting.
- Where I, by the way, have to manually pass the correct formatting
options.
In the workflow I want, the cycle is:
1) Edit file
2) Save file, which causes the editor reformat automatically *).
Here; I do not have to remember to reformat the file. The editor does it
for me. It is automatically correct rather than correction due to active
manual labor on my part.
Obviously, the status quo workflow is possible. We have been doing it
for years. However, we should not make a human do the work of a machine.
Make the machine do what it does best; follow the same procedure every
time. This enables us to free up mental bandwidth of our human
volunteers for other things.
*) For packages that have opted in to automatic styling, since this is
not a mandatory thing. Stating this explicitly to avoid the conversation
detailing into a question of this being imposed.
[...]
But even if I am not successful with
`debputy`, I cannot imagine I would consider returning to lintian. It does
not scratch my itch and years of issues (some of which are still unfixed)
have made me not want to have anything to do with the tool.
[...]
Given your very interesting input we actually need people who are able to
dedicate quite some time on restructuring lintian in a way that respects the
fact that some checks can be done / are done by some other tool on source
level. Alternatively lintian itself could be modularised to rather do what
you want.
Both in-editor feedback and the "debian files of an unpacked source
tree" are the parts I am trying to cover with `debputy` (via `debputy
lsp server` + `debputy lint/reformat` respectively)
I do not see lintian expanding to in-editor feedback. It is a massive
undertaking in its own. Given no one have solved the "run lintian on an
unpacked source tree" yet, which would be a prerequisite and also a
considerable undertaking on top, I doubt we will ever see it. I also do
not see any note worthy benefit of attempting direct code reuse from
lintian at this step.
When you work on in-editor feedback, you will need at least:
1) A lenient parser that keeps track of all sorts of things like
syntax errors, white space, and comments that is usually the first
thing your parser throws away to keep things simple. Ideally, it
also:
- supports reading a string or a line of lines, since the editor
content are not always persisted to the file system. Instead, you
get it from "somewhere else" (fed via socket in the LSP case)
- continues after syntax errors, since otherwise you only get one
error on syntax errors and most other feedback disappears, which
can be annoying to the user. Especially important for completion
since the half-finished typing might be syntactically be invalid.
(Also, inserting a field in a deb822 stanza will temporarily split
the stanza into two where at least one of them will definitely
be invalid. You will want to be able to compute the completion
as-if the stanzas are not split despite the file being
"stanza, empty line / syntax error, stanza")
2) Additionally, you need to know file ranges of everything. One thing
is identifying that the `foriegn` value in the `Multi-Arch` field
was a typo of `foreign`. But for editor support, you have to tell
the editor where to put the marker. That range is different in all
of the cases below:
Multi-Arch: foriegn
Multi-Arch:foriegn
Multi-Arch:
# Comment for the sake of the argument; probably breaks
foriegn
In all cases, the marker should be on the `foriegn` work because
that is where the mistake. If you are lucky, you get the line number
where `Multi-Arch:` appears and then you get retrace things
manually. That gets even more complicated for non-string types or
where parser "cleans" up things for you. As an example, with most
deb822 parsers, it is hard to tell `Multi-Arch:foreign` apart from
`Multi-Arch: foreign`, since the white space is to be trimmed in
that particular case.
Note ranges goes two ways. For diagnostics (linting), you tell the
editor where the marker goes. For completion and hover docs, the
editor tells you where the user is and you have to figure out what
is at that point (file "debian/control, line 22, column 14"). This
means you need a two-way mapping between content and position.
Here, lintian only does one way mapping, and it only does basic
positioning (like line or line + column). For code reuse, it would
have to do full range of issues.
3) You will need a lot of extra metadata that no one else will need.
As an example, a simple linter might get away with knowing that
"Multi-Arch" is a known field and has 4 allowed values. A complex
one would know about 4 values with one of them being conditional
on the Architecture field (which is less trivial to share in
data-only format). If you do an on-line editor feature with:
- hover docs, then you need the main documentation you want to show
the user for the field and each of the values (depending on what
the user requests docs for). Hover docs are partially static and
partially dynamic data, which makes general purpose sharing of
this data less trivial.
- completion, then you may want to have a one-liner documentation
for the values. Maybe some sorting hints to the editor, so it
knows it should de-emphasis "allowed". Additionally, you want
to track whether the values you offer are allowed in this context
(which for Multi-Arch means checking the `Architecture` field,
while for `Protected` it is static metadata that `no` is the
default and the default would trigger a warning.)
- In all of the above cases, you also want fields / data about
things you cannot check. A linter does not need to know about
all fields it cannot check (other than maybe for field name
canonicalization purposes, a.k.a. "cute-field"). In the editor
support, every known field is now also part of the completion
"vocabulary" and hover docs may still be useful.
4) Mentally to structure your work will be built around the user
interacting with the editor. That is, you will be forced into an
event driven architecture. Latency is visible to the user and will
annoy them. A full second is a long wait at this point.
Related, the user typing is sometimes multiple events because the
user happened to type a bit too slow or maybe they stopped typing
midway. So you want support for stopping long running diagnostics,
so you do not build up a queue of pending but now irrelevant
diagnostics.
Lintian, for comparison, is entirely in a batch driven architecture,
where latency of most steps was never important.
This is beyond the particular "idiosyncrasies" of how the LSP
specification and tracking what the editor supports, when to provide
what information to the editor, etc.
I can tell you with absolute certainty that lintian is ready for
basically none of the above. It was not built for it and parts of this
are an absolute pain to do. You do that because you have to do it to
work with the editor support, not to support another project while you
are already drowning in work trying to keep the project afloat.
Additionally, for a linter (hammer), every thing is a diagnostic (nail).
For an editor integration, you have a more varied toolbox. As an
example, `debputy` does not emit diagnostics for trailing white space
like lintian does (with `--pedantic` as I recall). Instead, `debputy`
fixes them automatically on saving where relevant. Because that is a
better solution for the user when you are not forced to solve everything
like a linter (hammer).
Accordingly, even if it was possible to share all the lintian code, I
would not want all of it meaning that lintian would now need conditions
for "things `debputy` wants vs. things `debputy` does not". Again, not
the thing you need trying to keep your project afloat.
[...]
PS: In my view, the bleeding of lintian's quality started long before Axel
joined the lintian maintenance team and I do not fault Axel for being unable
to stop the bleeding. In my view, only a hero could have "managed" that at
the expense of their mental health.
Thanks a lot for your mental support to Axel which I confirm from my side.
To draw some conclusion out of the discussion: We need to enhance the way
we are checking our packages for conformance with our policy. You made
clear that quite a part can be done at source level. I'm not fully sure
whether your main focus is on the time inside the build process or the
editing features you mentioned.
The `debputy` framework has two different "legs" here. One is the
in-editor feedback with some batch counter parts for CI pipelines, which
aims to be generally applicable to all packages.
The other leg is `debputy` self-checking the packaging instructions for
packages built with `debputy`. In a sense, this also counts as policy
checking but it is not a static analysis and therefore is not comparable
to lintian.
It is also not clear to me whether you are
questioning the general architecture like for instance the rule sets that
are in /usr/share/lintian/data. IMHO this is a valuable set of rules that
can be used by alternative tools as well. Do you agree with this or not?
I find that data to be of questionable value to my work at the present
time or other tools in this area:
1) I do not remember lintian every committing to these being part of
its API. Indeed, I see some files that have changed format since
my time there and they often also engineered to fit lintian specific
needs rather than being general purpose data files.
2) A large part of the files would not be relevant to my work since
I am not looking at upstream code or packaged artifacts.
3) In my work, I would need a lot extra auxiliary metadata that lintian
will not need (per my remarks above on doing your own editor
integration).
Obviously, there could be value in sharing rules, data and metadata of
this kind with other interested projects. Jelmer and I already discussed
this possibility in relation to `lintian-brush`. However, it is not
something solved by simply declaring `/usr/share/lintian/data` as stable
API. Instead, I would rather extract subsets of it into a general
purpose data package as needed.
Ideally one where we can release the data faster than checkers, so we do
not get the annoying effort that a new debian-policy upload triggers our
static analysis tools being out of date for weeks or even months.
Side-bar: This debian-policy problem is one reason why `debputy` does
not flag "newer-standards-version" as a problem (only older). I do not
want to repeat this problem in `debputy`.
It is a trade-off, because a typo could make the version too new by
mistake and that would be silent in `debputy` at the moment. So I am
definitely interested in outsourcing part of the data.
As I wrote in my other mail in this thread[1] I could imagine some policy
checker step after dh_clean. When thinking twice about it another step
could be done before dh_builddeb which could detect lots of issues before
the package is built and can save the unpackaging step. Are you targeting
at this as well?
> Kind regards and thanks a lot for your inspiring input
Andreas.
[1] https://lists.debian.org/debian-devel/2024/05/msg00162.html
No, I am not targeting this for `debhelper`. If you build a package with
`debputy` instead of `debhelper`, there are some built-in self-checks of
the provided packaging instructions compared to the "about to be
produced"-package. It is conceptually similar to `dh_install` erroring
out when you reference `usr/bin/foo` and `dh_install` cannot find said file.
It would not be difficult to add some form of policy checking layer on
top of this, though the question is what we want to check at this point
where the helper should not just fix it instead. If the tool can fix it,
then it is better than "here is a problem for you to read up on and then
fix manually even though there was only one obvious solution". One thing
requires brain-cells, the other does not.
My end goal with `debputy` is that the average contributor should spend
less brain-cells on packaging. That way, a contributor gets a better
"mileage" than they do today. That is why I am a bit hesitant about
doing "in build policy checker". Though, feel free to present concrete
cases and I will consider it.
Best regards,
Niels