Github user chunhui-shi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/430#discussion_r56924863
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/HashHelper.java 
---
    @@ -17,47 +17,77 @@
      */
     package org.apache.drill.exec.expr.fn.impl;
     
    +import io.netty.buffer.DrillBuf;
    +import org.apache.drill.common.config.DrillConfig;
    +import org.apache.drill.common.exceptions.DrillConfigurationException;
    +
     import java.nio.ByteBuffer;
     import java.nio.ByteOrder;
     
    -public class HashHelper {
    +public abstract class HashHelper {
       static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(HashHelper.class);
    +  public static final String defaultHashClassName = new 
String("org.apache.drill.exec.expr.fn.impl.MurmurHash3");
    +  static final String HASH_CLASS_PROP = "drill.exec.hash.class";
     
    +  static String actualHashClassName = defaultHashClassName;
    +  static DrillHash hashCall = new MurmurHash3();
    +  static {
     
    -  /** taken from mahout **/
    -  public static int hash(ByteBuffer buf, int seed) {
    -    // save byte order for later restoration
    -
    -    int m = 0x5bd1e995;
    -    int r = 24;
    +    try {
    +      DrillConfig config = DrillConfig.create();
    +      String configuredClassName = config.getString(HASH_CLASS_PROP);
    +      if(configuredClassName != null && configuredClassName != "") {
    +        actualHashClassName = configuredClassName;
    +        hashCall = config.getInstanceOf(HASH_CLASS_PROP, DrillHash.class);
    +      }
    +      logger.debug("HashHelper initializes with " + actualHashClassName);
    +    }
    +    catch(Exception ex){
    +      logger.error("Could not initialize Hash %s", ex.getMessage());
    +    }
    +  }
     
    -    int h = seed ^ buf.remaining();
    +  public static String getHashClassName(){
    +    return actualHashClassName;
    +  }
     
    -    while (buf.remaining() >= 4) {
    -      int k = buf.getInt();
    +  public static int hash32(int val, long seed) {
    +    double converted = val;
    +    return hash32(converted, seed);
    +  }
    +  public static int hash32(long val, long seed) {
    +    double converted = val;
    +    return hash32(converted, seed);
    +  }
    +  public static int hash32(float val, long seed){
    +    double converted = val;
    +    return hash32(converted, seed);
    +  }
     
    -      k *= m;
    -      k ^= k >>> r;
    -      k *= m;
    +  public static int hash32(double val, long seed){
    +    return hashCall.hash32(val, seed);
    +  }
     
    -      h *= m;
    -      h ^= k;
    -    }
    +  public static  int hash32(int start, int end, DrillBuf buffer, int seed){
    +    return hashCall.hash32(start, end, buffer, seed);
    --- End diff --
    
    @jacques-n Do you think this indirection may cause some performance 
overhead? The indirection was introduced to allow us to test different hash 
functions without rebuilding the whole thing. 
    As to the second ask, can you elaborate "this class dereference to the 
hashCall class field gets eliminated" for me to understand the concern here? 
Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to