nicoloboschi opened a new pull request, #15428:
URL: https://github.com/apache/pulsar/pull/15428

   ### Motivation
   ElasticSearch has a size limitation for the document id. The hard limit is 
512 bytes. It is not configurable.
   
   >_id is limited to 512 bytes in size and larger values will be rejected.
   
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html
   
   In order to by-pass this limitation without losing data integrity and 
uniqueness a new option is introduced to create an hash of the Pulsar record 
key.
   
   ### Modifications
   * New option `idHashingAlgorithm`, enum NONE, SHA256, SHA512
   * For the hashing computation we use Guava utility class 
`com.google.common.hash.Hashing`
   
   Note that this option will add a CPU overhead in order to compute the hashed 
value.
   It is suggested to use this option together with `canonicalKeyFields` in 
order to guarantee the same hashed value for the same key content regardless 
the key fields order. (https://github.com/apache/pulsar/pull/15426)
    
   - [x] `doc` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to