Re: [tcpdump-workers] [tcpdump] After setjmp/longjmp update

Denis Ovsienko via tcpdump-workers Thu, 24 Sep 2020 19:05:08 -0700

--- Begin Message ---

On Thu, 17 Sep 2020 15:15:25 +0100
Denis Ovsienko via tcpdump-workers <tcpdump-workers@lists.tcpdump.org>
wrote:

> On Sat, 5 Sep 2020 18:20:42 +0200
> Francois-Xavier Le Bail via tcpdump-workers
> <tcpdump-workers@lists.tcpdump.org> wrote:
> 
> > 2) Process all the truncated cases with:
> > ndo->ndo_ll_hdr_len = 0;
> > longjmp(ndo->ndo_truncated, 1);
> > (With a new macro, like 'ND_TRUNCATED' or 'ND_IS_TRUNCATED')  
> 
> The master branch now has a change along these lines. Whilst preparing
> changes to a couple decoders based on that (still work in progress), I
> managed to make some observations, will post as soon as it all looks
> good and makes sense.

Hello list.

This update may be a bit long because it is a digest of about two weeks
of work. I have pushed a few commits today, which represent about the
last 10 days of that work.

This is my attempt to map and to reduce the problem of completing the
conversion to the new means of detecting and handling packet
truncation. Please excuse me if an idea that looks original to me had
been discussed before -- my e-mail backlog is quite bad in some areas.

First I looked at the original conversion plan with more than a
thousand ND_TCHECK*() calls (they are not calls, but I do not have a
more convenient term) remaining to review, then identified and
processed a few obvious patterns (such as ND_TCHECK_LEN(cp,
MAC_ADDR_LEN) before GET_ETHERADDR_STRING(cp)). That was relatively
quick and easy.

One important thing I had realised in the process was that many
ND_TCHECK*() calls still have a valid purpose and must remain along
with the GET_*() macros.

For example, when the encoding has a "pad" field, its value is
irrelevant, but the fact of being (or not being) within the packet
buffer makes a difference, and an ND_TCHECK*() sometimes does just the
right thing.

Another valid use case for an ND_TCHECK*() is a decoder that varies
amount of detail depending on the value of ndo_vflag. For the detailed
output it would use a series of GET_*() calls to dig deep into a
specific sequence of bytes, and for the default output it would just
need to print how many bytes long it is and to check that it is within
the buffer.

One obvious use case where an ND_TCHECK*() call should be removed is
when it tests an entire structure, and then a sequence of GET_*() calls
safely take the entire structure apart one field at a time. In this
case removing the ND_TCHECK*() increases the amount of detail the code
would print when ndo_snapend is in the middle of the structure.

## Problem 1 (P1): determine which ND_TCHECK*() calls must remain and
which should be removed.

After the easy few initial rounds each next pattern-based round turned
out to be for a decreasing amount of ND_TCHECK*() calls and an
increasing amount of complexity around them, so this approach was
unlikely to converge anytime soon, if at all.

To add to that, it was not always clear which ND_THECK*() calls had
already been reviewed and which had not yet, so the total complexity
started to look closer to O(n**2) than to O(n).

## Problem 2 (P2): tell for which ND_TCHECK*() calls P1 has already
been solved, and for which not yet, so the ratio represents the
progress and there is no duplication and no omission.

Then I looked at the task once again and thought that changing all the
ND_TCHECK*() macro definitions to do longjmp() instead of "goto trunc"
would be easy to do, but too difficult to verify, and would leave an
unknown amount of obsolete code in place, so the solution needs to be
something else.

## Problem 3 (P3): make the ND_TCHECK*() calls that must remain use
longjmp() before P1 is solved for every ND_TCHECK*()call.

I thought of adding an alternative set of macros that would do
longjmp(), and updating the code to use these macros instead of the
ones that do "goto trunc". It would provide a clear indication of
progress, although it would be a lot of code churn. But it felt slighty
more doable.

So I decided to try that approach on four files that I had written a
while ago, in order of increasing complexity to see if the method works
out (SLOC stands for lines of code as measured by sloccount):

* print-loopback.c -- a tiny protocol (85 SLOC)
* print-aoe.c -- a small protocol (321 SLOC)
* print-ahcp.c -- a small protocol (329 SLOC)
* print-openflow-1.0.c -- a medium-sized protocol (1768 SLOC)

Because in each case I was familiar with both the protocol encoding and
the code, it was easier to focus on the conversion details.

Before long I realized that I am changing so many lines, but the only
effect is that the "goto trunc" inside a macro becomes "longjmp()"
inside another macro, and eventually the "trunc" label inside a
function becomes unused, so it would be much simpler just to call
longjmp() at the "trunc" label. So I had introduced nd_trunc().

Then I had converted the first three files from the list to use
nd_trunc() and it worked. Most of the difficulty was in the change of
the sizing arithmetics to make calls like ND_TCHECK_LEN(cp, len) work
for invalid packets at any time. That change does not belong to the
original problem space, but in small files it was not too much work and
allowed me to figure the method out. Then I started to apply the method
to the OpenFlow code, and it turned out to be a job bigger than I
expected.

The matter is, when you work on some code for a while, you start to
notice assorted unrelated issues. Trying to fix these issues before the
conversion sometimes makes the conversion more difficult. Not fixing
the issues before the conversion also sometimes makes the conversion
more difficult. So it often takes to apply interpretation to strike a
fair balance. Perhaps in the next printer conversions it will be better
to make a list of things to fix later, and to keep to the shortest
critical path only.

Anyway, eventually I could separate my working copy changes into 5
commits, of which the last one (4e2e9c24) is mostly the conversion to
nd_trunc() (in other words, it solves P3 for the OpenFlow code). Please
see the detailed commit message and the header comment of
print-openflow-1.0.c, it should work well for other TLV-like encodings
too.

The commit before that (b4dbcce0) actualises bounds checks (in other
words, it solves P1 for print-openflow-1.0.c).

In terms of P2, to the best of my understanding, P1 and P3 have been
fully solved for the following files: (I may be wrong and you are
welcome to point the mistakes out):

* print-loopback.c
* print-aoe.c
* print-ahcp.c
* print-openflow.c
* print-openflow-1.0.c

== Summary: the method seems to work well, there is a clean reference
implementation, it should be easy to apply to other printers that
implement similar encodings, there may be encodings that will require
a different method, a notable difficulty in the application is not to
get distracted by fixing unrelated issues right now.

P.S.
As a related idea, it may be possible to use another method instead or
in parallel with the above. Specifically, the ND_TCHECK*() macros can
take a different action depending on whether something is declared or
not.

This could be done similar to the NETDISSECT_REWORKED macro several
years ago: you define a symbol before including netdissect.h, and all
functions in the .c file have a longjmp() inside every ND_TCHECK*()
instead of "goto trunc".

This would be a middle ground between one function at a time and all
files at a time, and it would be a convenient way to solve P3 for a
file for that P1 (and thus implicitly P2) has already been solved.

-- 
    Denis Ovsienko

--- End Message ---

_______________________________________________
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers

Re: [tcpdump-workers] [tcpdump] After setjmp/longjmp update

Reply via email to