Bron Gondwana wrote in
 <[email protected]>:
 ...
 |[.] I have a draft for a method at:
 |
 |https://datatracker.ietf.org/doc/draft-gondwana-dkim2-modification-alegbra/
 |
 |It can be used to describe all "add text" cases quite nicely, as well \
 |as wrapped structures where an existing message gets moved into a multip\
 |art/mixed with more content at the end. There's still some testing \
 |to be done for the most complex cases - but this doesn't have to be \
 |a two-way algorithm, is just has to allow describing how to convert \
 |a new email body back to the original email body, and I believe this \
 |can be done reliably and at a reasonable cost, though it could definitely \
 |use some more examples.
 |
 |I'm going to publish an update with another mechanism which reduces \
 |the cost of the "remove an attachment" version to at least not fill \
 |the headers with tons of junk.  It doesn't reduce the message size \
 |though, because you do need to be able to recreate the old message.

I wondered for myself how the bsdiff algorithm would work out for
such things.  This is a very old program present in any FreeBSD
system since twenty years and more.  The executable is all in all
~8.5KB, and it uses the libdivsufsort library (the source of)
which is 72 KB all in all.  This executable by default compresses
via bzip2 (which makes a bit of the 8.5 KB).

For example if i strip the content of the HTML part of your
message, then removing the IETF ML attachment diff(1)

  -rw-------  1 steffen wheel 5295 Nov 19 01:50 m0
  -rw-------  1 steffen wheel 4929 Nov 19 01:50 m1

  --- m0  2024-11-19 01:50:20.390006000 +0100
  +++ m1  2024-11-19 01:50:32.441447000 +0100
  @@ -87,16 +87,6 @@ Content-Transfer-Encoding: quoted-printable
   Content-Type: text/html
   Content-Transfer-Encoding: quoted-printable

  ---===============5952072662436684613==
  -Content-Type: text/plain; charset="utf-8"
  -MIME-Version: 1.0
  -Content-Transfer-Encoding: base64
  -Content-Disposition: inline
  -
  -X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KSWV0Zi1ka2lt
  -IG1haWxpbmcgbGlzdCAtLSBpZXRmLWRraW1AaWV0Zi5vcmcKVG8gdW5zdWJzY3JpYmUgc2VuZCBh
  -biBlbWFpbCB0byBpZXRmLWRraW0tbGVhdmVAaWV0Zi5vcmcK
  -
   --===============5952072662436684613==--

via bsdiff results in a 168 byte file.
This can be changed of course as this file is identifieable

  # file yy
  yy: bsdiff(1) patch file

and the IETF drafted algorithm lzip is very good with such text
(much better than the much larger (factor ten) RFCd zstd).
The author of the algorithm is a decade long FreeBSD+ developer
and had written his Oxford thesis based on this topic:

  http://www.daemonology.net/papers/thesis.pdf

'Must be said that the memory cost of this thing is

   The bsdiff utility uses memory equal to 17 times the size of
   oldfile, and requires an absolute minimum working set size of
   8 times the size of oldfile.

which is quite a bit with those new-style HTML emails with lots of
too-large-a-snapshot images.

I want to point out that, as can be seen above, especially the
Mailman(3) ML software, or let's say, especially the Python stuff,
has a favour of reencoding anything in base64.  Whereas others,
for example the one used by the OpenGroup, has the pesky quirk of
reencoding to 8-bit -- even if that means that "From "quoting
"has" to be applied.
This effectively means that the differences after mangling of such
things like mailing-list managers will, at the current state of
affairs, be larger than what i would expect from reading what was
said on that diffing topic.
At least today, i always hated it, and maybe if people like you,
Mr. Levine and others speak to maintainers of MIME aware
mailing-list managers, things will change over time.

  ..

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself fore'er and e'er
|
|Farewell, dear collar bear

_______________________________________________
Ietf-dkim mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to