[GitHub] [pulsar] eolivelli commented on a diff in pull request #18668: [improve][io] ElasticSearch sink: conditional id hashing

GitBox Tue, 29 Nov 2022 01:32:13 -0800


eolivelli commented on code in PR #18668:
URL: https://github.com/apache/pulsar/pull/18668#discussion_r1034503124



##########
site2/docs/io-elasticsearch-sink.md:
##########
@@ -52,43 +52,44 @@ The configuration of the Elasticsearch sink connector has 
the following properti
 
 ### Property
 
-| Name | Type|Required | Default | Description 

Review Comment:
   did you reformat the table ?



##########
pulsar-io/elastic-search/src/main/java/org/apache/pulsar/io/elasticsearch/ElasticSearchSink.java:
##########
@@ -240,20 +240,28 @@ public Pair<String, String> 
extractIdAndDocument(Record<GenericObject> record) t
             if (id != null
                     && idHashingAlgorithm != null
                     && idHashingAlgorithm != 
ElasticSearchConfig.IdHashingAlgorithm.NONE) {
-                Hasher hasher;
-                switch (idHashingAlgorithm) {
-                    case SHA256:
-                        hasher = Hashing.sha256().newHasher();
-                        break;
-                    case SHA512:
-                        hasher = Hashing.sha512().newHasher();
-                        break;
-                    default:
-                        throw new UnsupportedOperationException("Unsupported 
IdHashingAlgorithm: "
-                                + idHashingAlgorithm);
+
+                boolean performHashing = true;
+                if (elasticSearchConfig.isConditionalIdHashing()
+                        && id.getBytes(StandardCharsets.UTF_8).length <= 512) {

Review Comment:
   do we really need to create the byte[] instance ?
   it will generate some garbage
   
   maybe you can create the byte[] here and do not call  `hasher.putString(id, 
StandardCharsets.UTF_8);` but use the byte[] created here.
   I suspect that `putString` will perform the encoding another time



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [pulsar] eolivelli commented on a diff in pull request #18668: [improve][io] ElasticSearch sink: conditional id hashing

Reply via email to