Bron Gondwana wrote in
 <[email protected]>:
 |On Wed, Nov 20, 2024, at 08:15, Steffen Nurpmeso wrote:
 |> that goes out without MIME as such (text/plain 7-bit content-type
 |> is optional), but both of these two messages came in via ML as
 |> 
 |>   Content-Type: text/plain; charset="utf-8"
 |>   Content-Transfer-Encoding: base64
 |
 |Yeah, if the source message isn't MIME encoded, Mailman re-encodes. \

It is more than that.

 | It's a "detect message type" flag in the code, and it would be trivial \
 |to add a config "don't do that if DKIM2" and instead just MIME-wrap \
 |the existing message with the existing charset.

  ...
 |>   -rw-r-----   1 steffen wheel 2167 Nov 19 21:22 t1-i.txt
 |>   -rw-r-----   1 steffen wheel 2201 Nov 19 21:22 t1-o.txt
 |>   -rw-------   1 steffen wheel  236 Nov 19 21:22 t1-patch
 |>   -rw-r-----   1 steffen wheel 8412 Nov 19 21:22 t2-i.txt
 |>   -rw-r-----   1 steffen wheel 5932 Nov 19 21:22 t2-o.txt
 |>   -rw-------   1 steffen wheel 4350 Nov 19 21:23 t2-patch
 |> 
 |> Hm.  Ok let me remove the bzip2 stuff from bsdiff..  Here is the
 |> same without, and then running plzip and zstd on the uncompressed
 |> binary data; this still has the normal header and such (note
 |> i have not yet looked at all, it may very well be that patches at
 |> position 0 or "EOT" could be optimized away etc etc.
 |> 
 |>   plzip -9 and zstd -19
 |> 
 |>   -rw-------   1 steffen wheel  142 Nov 19 21:48 t1-patch-2.lz
 |>   -rw-------   1 steffen wheel  116 Nov 19 21:48 t1-patch-2.zst
 |> 
 |>   -rw-------   1 steffen wheel 4654 Nov 19 21:48 t2-patch-2.lz
 |>   -rw-------   1 steffen wheel 4577 Nov 19 21:48 t2-patch-2.zst
 |> 
 |> It would be interesting to know how your implementation of the
 |> algorithm works out for those (and the "real" vcsdiff
 |> implementation i have seen is huge).  Would be cool if it is
 |> superior, of course.
 |
 |My code uses a pretty basic perl diffing tool, but we could use vcsdiff \
 |just fine too - and have it be an input to that format.  The format \
 |really is basically just the logic from RFC3284; but encoded to be \
 |readable.

Ok i now downloaded xdelta3 which uses the VCDIFF algorithm (like
Google's really big thing open-vcdiff), and i see i get for t1

  Offset Code Type1 Size1  @Addr1 + Type2 Size2 @Addr2
  000000 019  CPY_0     54 S@0
  000054 002  ADD        1
  000055 034  CPY_0     18 S@59
  000073 003  ADD        2
  000075 019  CPY_0     27 S@83
  000102 019  CPY_0    196 S@112
  000298 107  CPY_5     11 S@310
  000309 051  CPY_2     53 S@323
  000362 007  ADD        6
  000368 051  CPY_2     45 S@386
  000413 051  CPY_2    111 S@433
  000524 099  CPY_5    250 S@546
  000774 035  CPY_1     21 T@309
  000795 014  ADD       13
  000808 069  CPY_3      5 T@362
  000813 003  ADD        2
  000815 051  CPY_2     38 S@843
  000853 099  CPY_5    238 S@883
  001091 003  ADD        2
  001093 051  CPY_2   1074 S@1127

so i wildly guess you actually postprocess this output (for now).
The two examples i had posted are smaller when processed with
bsdiff compared to non-postprocessed VCDIFF, that much is plain.

But thank you!
Ciao,

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself fore'er and e'er
|
|Farewell, dear collar bear

_______________________________________________
Ietf-dkim mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to