sbp commented on pull request #517:
URL: 
https://github.com/apache/incubator-ponymail/pull/517#issuecomment-679952122


   @sebbASF
   
   It is not yet documented why the command line list ID would need to be 
present in the permalink. Am I right in thinking that the following is the only 
use case?
   
   Consider an mbox archive whose messages contain no list IDs in common with 
the command line list ID imposed by the administrator. All of its messages in 
Ponymail are later lost, but the original mbox archive file is still available. 
Since the messages in Ponymail were lost, the command line list ID is also 
lost. But since the command line list ID was present in permalinks, if a user 
of the list has that any permalink available to them then the command line list 
ID can be recovered.
   
   I can think of no other use case.
   
   There are far better data recovery strategies available. One could, for 
example, maintain a mapping of command line list IDs to any individual DKIM IDs 
only contained within that list. This is suitable in the case where an entire 
archive is expected to be recovered. Such a mapping file would be extremely 
small, on the order of KiB, and would therefore be easily replicated across 
many systems.
   
   If only individual messages are expected to be recovered, then the mapping 
of command line list IDs to all DKIM IDs would be necessary. This would only 
require storing sixteen bytes for every email in the system, so even an archive 
with a million emails would only require a mapping file of about 15 MiB.
   
   Even in the original suboptimal strategy, it is not necessary to make the 
command line list ID a mandatory part of a permalink. It could instead be made 
optional, like labels used in Amazon URLs, some weblog software, and on some 
news sites, as the following examples demonstrate:
   
   ```
   https://www.amazon.com/Apache-Definitive-Guide-Ben-Laurie/dp/0596002033
   https://www.amazon.com/Anything-Can-Go-Here/dp/0596002033
   https://www.amazon.com/dp/0596002033
   
   https://lobste.rs/s/j7p2ow/what_are_you_doing_this_week
   https://lobste.rs/s/j7p2ow/anything_can_go_here
   https://lobste.rs/s/j7p2ow
   
   
https://www.reuters.com/article/apache-moves-on-traffic-server-machine-learning-projects-idUS57202199920100504
   https://www.reuters.com/article/anything-can-go-here-idUS57202199920100504
   https://www.reuters.com/article/idUS57202199920100504
   ```
   
   Amazon and Reuters use an infix pattern, whereas Lobsters uses a suffix 
pattern. Users could strip the Ponymail list ID, whether command line or 
archive metadata derived, from the permalink:
   
   ```
   https://lists.apache.org/thread/MTIzNDU2Nzg5MDEyMzQ1Ng/dev.project.apache.org
   https://lists.apache.org/thread/MTIzNDU2Nzg5MDEyMzQ1Ng/anything.can.go.here
   https://lists.apache.org/thread/MTIzNDU2Nzg5MDEyMzQ1Ng
   ```
   
   Or if the malleability of `anything.can.go.here` is undesirable, the UI 
software could ensure that the message actually appears in the list ID in the 
optional part of the URL. But I think that, as @rbowen noted, the first thing 
any user wants to do with a URL that's too long to easily share is to shorten 
it, either by taking out optional components or by submitting it to a link 
shortener.
   
   *Links are themselves UI, and they ought to be designed in a user friendly 
way.*
   
   Links which are too long are not user friendly, and this is why sites use 
IDs like `0596002033`, `j7p2ow`, and `idUS57202199920100504`, to recapitulate 
the actual examples mentioned above. They don't use mandatory IDs like 
`MTIzNDU2Nzg5MDEyMzQ1Ng_dev.project.apache.org`. Even `MTIzNDU2Nzg5MDEyMzQ1Ng` 
could be regarded as too long, but unlike Amazon, Lobsters, and Reuters we have 
the constraint that we would like to be able to generate the ID again from the 
content, which means using a hash, which means considering the hash security; 
and indeed I provided an informal analysis earlier in this thread.
   
   I would very much like wider review and more discussion of this pull 
request. I notice, however, that the 40 or so messages, from four contributors, 
currently in this thread compares rather favourably to the following number of 
messages in the threads of all previous PRs on Ponymail:
   
   **0, 0, 0, 5, 2, 3, 0, 2, 1, 0, 3, 3, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 
2, 0, 2, 2, 1, 0, 3, 2, 10, 4**
   
   Combined, this is 54 messages across every single PR, merged or unmerged. I 
count that 17 out of 35 PRs were merged. I also counted the number of 
participants in the threads of *only the merged* PRs, giving the following 
figures:
   
   **1, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, 3, 3, 2, 3, 2, 2**
   
   I notice, therefore, that the current PR already far exceeds the amount of 
review of all existing PRs, almost surpassing their combined number of 
messages, and that the number of contributors to the thread already surpasses 
that of every existing merged PR.
   
   Despite this, I repeat the call for wider review. Clearly this is a 
substantial contribution, and many of the prior PRs were trivial. I would 
especially like, for example, somebody to audit the behaviour of my algorithm 
vs the reference algorithm in the `dkimpy` package, and to provide a more 
formal analysis of the security parameters of the hash.
   
   It is also clear that this PR needs to be modified before it can be 
accepted. As I understand it, the following modifications could aid consensus:
   
   * The hash encoding could be converted to base64
   * The hash digest length could be 128 bits, encoded as 22 characters
   * The pepper mechanism should be removed
   * The command line List ID should not be added to the message before hashing
   * The algorithm could potentially also be renamed
   
   It would also be useful if objecting participants would *concisely* state 
all of their remaining objections.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to