On Wed, Nov 20, 2024, at 08:15, Steffen Nurpmeso wrote:
> that goes out without MIME as such (text/plain 7-bit content-type
> is optional), but both of these two messages came in via ML as
> 
>   Content-Type: text/plain; charset="utf-8"
>   Content-Transfer-Encoding: base64

Yeah, if the source message isn't MIME encoded, Mailman re-encodes.  It's a 
"detect message type" flag in the code, and it would be trivial to add a config 
"don't do that if DKIM2" and instead just MIME-wrap the existing message with 
the existing charset.

> And here the complete text need to be replaced.  This is t2*.txt.
> And *if* (just thinking) that is only a forwarder address like the
> @FreeBSD.org one of Colin Percival, of CPAN, sourceforge or such
> kind of, then *possibly* (but likely not) once again.
> 
> Things are better (for me as a German who effectively writes
> mostly 7-bit ASCII) for the mentioned OpenGroup server, where you
> sent eg text/plain; charset="utf-8"/quoted-printable (because of
> a MIME-folded long line, and a German name with Umlauts etc) and
> only get the 8-bit conversion.
> 
> One more question: how about a language which practically always
> needs UTF-8 with more than one byte per character, ie, an Asian
> language, and such?  For anyone not going the 8-bit way (like
> myself) this is thus either quoted-printable or base64 right away.
> Then reencodings to 8-bit are more expensive.
> 
> Well i do not know, it would have to be tested on real life data;
> of course one could hope for the future, if it is all 8-bit and
> if ML software and such stops this reencoding "madness", then..
> 
> And, of course, all this pretty much only affects the text parts,
> large images and such are base64 data and (pretty much) constant.
> 
> My examples from above, if i pass only the bodies (i will attach
> them) to bsdiff i get
> 
>   -rw-r-----   1 steffen wheel 2167 Nov 19 21:22 t1-i.txt
>   -rw-r-----   1 steffen wheel 2201 Nov 19 21:22 t1-o.txt
>   -rw-------   1 steffen wheel  236 Nov 19 21:22 t1-patch
>   -rw-r-----   1 steffen wheel 8412 Nov 19 21:22 t2-i.txt
>   -rw-r-----   1 steffen wheel 5932 Nov 19 21:22 t2-o.txt
>   -rw-------   1 steffen wheel 4350 Nov 19 21:23 t2-patch
> 
> Hm.  Ok let me remove the bzip2 stuff from bsdiff..  Here is the
> same without, and then running plzip and zstd on the uncompressed
> binary data; this still has the normal header and such (note
> i have not yet looked at all, it may very well be that patches at
> position 0 or "EOT" could be optimized away etc etc.
> 
>   plzip -9 and zstd -19
> 
>   -rw-------   1 steffen wheel  142 Nov 19 21:48 t1-patch-2.lz
>   -rw-------   1 steffen wheel  116 Nov 19 21:48 t1-patch-2.zst
> 
>   -rw-------   1 steffen wheel 4654 Nov 19 21:48 t2-patch-2.lz
>   -rw-------   1 steffen wheel 4577 Nov 19 21:48 t2-patch-2.zst
> 
> It would be interesting to know how your implementation of the
> algorithm works out for those (and the "real" vcsdiff
> implementation i have seen is huge).  Would be cool if it is
> superior, of course.

My code uses a pretty basic perl diffing tool, but we could use vcsdiff just 
fine too - and have it be an input to that format.  The format really is 
basically just the logic from RFC3284; but encoded to be readable.

>From RFC3284 there are 3 commands:

The instructions to encode and direct the reconstruction of a target
   window are called delta instructions.  There are three types:

      ADD:  This instruction has two arguments, a size x and a sequence
            of x bytes to be copied.

      COPY: This instruction has two arguments, a size x and an address
            p in the string U.  The arguments specify the substring of U
            that must be copied.  We shall assert that such a substring
            must be entirely contained in either S or T.

      RUN:  This instruction has two arguments, a size x and a byte b,
            that will be repeated x times.

I didn't bother implementing "RUN" because that seems like something that you 
don't realistically need in emails.  For headers I implemented both plaintext 
"ADD" and base64 ADD to allow encoding everything neatly.

The only other thing I'm thinking is whether a base64 decoding version of COPY 
would make sense for the body.  This would allow putting phrases into the MIME 
preamble rather than into an ADD command and keep the DKIM2-Body-Diff header 
short.  maybe "Diff" is the wrong name and I should rename it to Delta - which 
is the naming in the VCDIFF doc.

Bron.

> 
> You know, .. the "DKIM now horny" draft i will write anyway
> (because why not, it only extends DKIM/6376) will include diffing,
> it will state that normalized headers shall come first, followed
> by normalized body, all this to be diffed and optionally
> compressed (but decompressing MUST be supported; just today
> Antonio Diaz Diaz posted "Lunzip 1.15-rc1 released", very small
> decompressor only).
> Then, if additional headers are to be included these have to be
> prepended, like trace headers for an email; maybe that special
> case can be optimized away very easily (from bsdiff .. for now).
> 
> Regarding licenses these are BSD 2-clause, MIT, and i think lzip
> is available as public domain (despite the IETF draft variant).
> The nice thing about all that long time matured software is that
> it is very small, statically linking them all in is no problem; on
> FreeBSD:
> 
>   -rw-------  1 steffen wheel 19992 Nov 19 21:58 bsdiff.o
>   -rw-------  1 steffen wheel 14904 Nov 19 21:58 divsufsort.o
>   -rw-------  1 steffen wheel 43928 Nov 19 21:58 sssort.o
>   -rw-------  1 steffen wheel 32848 Nov 19 21:58 trsort.o
>   -rw-------  1 steffen wheel 19000 Nov 19 21:58 utils.o
>   #|f-1400:/tmp/z$ ll bsdiff
>   -rwx------  1 steffen wheel 49200 Nov 19 21:58 bsdiff*
>   #|f-1400:/tmp/z$ strip bsdiff
>   #|f-1400:/tmp/z$ ll bsdiff
>   -rwx------  1 steffen wheel 46200 Nov 19 21:58 bsdiff*
> 
> and on Linux:
> 
>   -rwxr-x--- 1 steffen steffen 105696 Nov 19 22:01 minilzip*
> 
> This is en- plus decoding, statically linked (lzlib).
> 
> --steffen
> |
> |Der Kragenbaer,                The moon bear,
> |der holt sich munter           he cheerfully and one by one
> |einen nach dem anderen runter  wa.ks himself off
> |(By Robert Gernhardt)
> |
> |And in Fall, feel "The Dropbear Bard"s ball(s).
> |
> |The banded bear
> |without a care,
> |Banged on himself fore'er and e'er
> |
> |Farewell, dear collar bear
> 
> _______________________________________________
> Ietf-dkim mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> 
> 
> *Attachments:*
>  • t1-i.txt
>  • t1-o.txt
>  • t2-i.txt
>  • t2-o.txt

--
  Bron Gondwana, CEO, Fastmail Pty Ltd
  [email protected]

_______________________________________________
Ietf-dkim mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to