sebbASF commented on pull request #517:
URL: 
https://github.com/apache/incubator-ponymail/pull/517#issuecomment-686813208


   This PR affects various different aspects of generation:
   - the fields used to build the hash source
   - how many bits of hash are generated
   - how the hash is presented for use.
   
   In order to determine its suitability, it first needs to be decided whether 
the hash is to be used for Permalinks and database ids or just Permalinks, as 
this will affect the fields that must be taken into account.
   
   If it is to be used only as a Permalink, then it does not matter if a few 
non-identical messages generate the same hash, so long as the system can 
display all matches. Obviously the chance of any duplicates should be 
minimised. I think the current design of the PR largely fulfils that 
requirement. [There are some edge cases where a single message is sent to a 
list twice that still need to be investigated. This can occur anywhere between 
the originator and the mailing list software e.g. if a retransmission occurs (I 
hope to add some examples to the test corpus soon). There are also some lists 
with aliases and mails have been copied to the alias. These appear as a 
separate emails on the list.]
   
   However if it is also used as a database id, then the hash must be unique 
for different messages.
   
   As to the hash presentation, shorter is better for Permalinks, and the 
charset is important.
   However for the internal database id those are not really a concern.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to