Jackie-Jiang commented on a change in pull request #7246:
URL: https://github.com/apache/pinot/pull/7246#discussion_r683852084



##########
File path: 
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/PrimaryKey.java
##########
@@ -35,6 +36,10 @@ public PrimaryKey(Object[] values) {
     return _values;
   }
 
+  public byte[] asBytes() {
+    return SerializationUtils.serialize(_values);

Review comment:
       I think this util uses the java ser/de to serialize the values. Can it 
serialize the `ByteArray` values?
   Java ser/de is usually not efficient, so we might want to build our own 
custom serializer. 

##########
File path: 
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/HashUtils.java
##########
@@ -0,0 +1,51 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.utils;
+
+import com.google.common.hash.Hashing;
+import org.apache.pinot.spi.config.table.UpsertConfig;
+import org.apache.pinot.spi.data.readers.PrimaryKey;
+import org.apache.pinot.spi.utils.ByteArray;
+
+
+public class HashUtils {
+  private HashUtils() {
+  }
+
+  public static byte[] hashMurmur3(byte[] bytes) {
+    return Hashing.murmur3_128().hashBytes(bytes).asBytes();
+  }
+
+  public static byte[] hashMD5(byte[] bytes) {
+    return Hashing.md5().hashBytes(bytes).asBytes();
+  }
+
+  public static Object hashPrimaryKey(PrimaryKey primaryKey, 
UpsertConfig.HashFunction hashFunction) {

Review comment:
       This method is specific for upsert handling, so I'd suggest move it out 
of this general util class.

##########
File path: 
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/UpsertConfig.java
##########
@@ -40,9 +40,16 @@
     APPEND, INCREMENT, OVERWRITE, UNION
   }
 
+  public enum HashFunction {
+    NONE, MD5, MURMUR3

Review comment:
       Suggest using `null` to represent disabling hashing. We don't expect 
people to explicitly put `NONE` as the hash function.

##########
File path: 
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/PartitionUpsertMetadataManager.java
##########
@@ -239,11 +238,12 @@ public void removeSegment(IndexSegment segment) {
   }
 
   public static final class RecordInfo {
-    private final PrimaryKey _primaryKey;
+    /** stores the primary key of the record or its hash value */
+    private final Object _primaryKey;
     private final int _docId;
     private final Comparable _comparisonValue;
 
-    public RecordInfo(PrimaryKey primaryKey, int docId, Comparable 
comparisonValue) {
+    public RecordInfo(Object primaryKey, int docId, Comparable 
comparisonValue) {

Review comment:
       We should do the hash within the metadata manager instead of on the 
caller side. The primary key handling logic should all be wrapped within this 
class




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to