Hi all, Please share your thought on this issue. In short, Grace Hash Join and Hybrid Hash Join are not being used. We only use Optimized Hybrid Hash Join. Therefore, I think it would be better to remove them. https://issues.apache.org/jira/browse/ASTERIXDB-1736 <https://issues.apache.org/jira/browse/ASTERIXDB-1736> ---------- Forwarded message ---------- From: Taewoo Kim (JIRA) <j...@apache.org> Date: Fri, Nov 18, 2016 at 5:06 PM Subject: [jira] [Created] (ASTERIXDB-1736) Grace Hash Join and Hybrid Hash Join are not being used. To: notificati...@asterixdb.incubator.apache.org
Taewoo Kim created ASTERIXDB-1736: ------------------------------------- Summary: Grace Hash Join and Hybrid Hash Join are not being used. Key: ASTERIXDB-1736 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1736 Project: Apache AsterixDB Issue Type: Improvement Reporter: Taewoo Kim Assignee: Taewoo Kim As the title says, Grace Hash Join and Hybrid Hash Join are not being used. I suggest that we remove these two join methods. Here are my findings for these two joins. 1) Grace Hash Join GraceHashJoinOperatorDescriptor is only called from two places: org.apache.hyracks.examples.tpch.client.join and TPCHCustomerOrderHashJoinTest. One is a Hyracks example (tpch.client) and the other is a unit test. This join is not used currently (not chosen during the compilation). 2) Hybrid Hash Join During the compilation, the optimizer decides whether it will use Hybrid Hash Join or Optimized Hybrid Hash Join. If the hash function family for each key variable is set, then we use the optimized hybrid hash join. If not, we use the hybrid hash join. However, in fact, this path - hybrid hash join path will never be chosen. Let's check the code. {code:title=HybridHashJoinPOperator.java|borderStyle=solid} IBinaryHashFunctionFamily[] hashFunFamilies = JobGenHelper. variablesToBinaryHashFunctionFamilies(keysLeftBranch, env, context); ... boolean optimizedHashJoin = true; for (IBinaryHashFunctionFamily family : hashFunFamilies) { if (family == null) { optimizedHashJoin = false; break; } } if (optimizedHashJoin) { opDesc = generateOptimizedHashJoinRuntime(context, inputSchemas, keysLeft, keysRight, hashFunFamilies, comparatorFactories, predEvaluatorFactory, recDescriptor, spec); } else { opDesc = generateHashJoinRuntime(context, inputSchemas, keysLeft, keysRight, hashFunFactories, comparatorFactories, predEvaluatorFactory, recDescriptor, spec); } {code} As we can see, optimizedHashJoin is set to false only when the hash family is null. Then, how do we assign the hashfamily for each key variable? {code:title=JobGenHelper.java|borderStyle=solid} public static IBinaryHashFunctionFamily[] variablesToBinaryHashFunctionF amilies( Collection<LogicalVariable> varLogical, IVariableTypeEnvironment env, JobGenContext context) throws AlgebricksException { IBinaryHashFunctionFamily[] funFamilies = new IBinaryHashFunctionFamily[varLogical.size()]; int i = 0; IBinaryHashFunctionFamilyProvider bhffProvider = context. getBinaryHashFunctionFamilyProvider(); for (LogicalVariable var : varLogical) { Object type = env.getVarType(var); funFamilies[i++] = bhffProvider.getBinaryHashFunctionFamily( type); } return funFamilies; } {code} For each variable type, we try to get hash function family. In the current codebase, AqlBinaryHashFunctionFamilyProvider is the only class that implements IBinaryHashFunctionFamilyProvider. And for any type, it returns AMurmurHash3BinaryHashFunctionFamily. So, there is no way that the hash function family is null. {code:title= AqlBinaryHashFunctionFamilyProvider.java|borderStyle=solid} public class AqlBinaryHashFunctionFamilyProvider implements IBinaryHashFunctionFamilyProvider, Serializable { private static final long serialVersionUID = 1L; public static final AqlBinaryHashFunctionFamilyProvider INSTANCE = new AqlBinaryHashFunctionFamilyProvider(); private AqlBinaryHashFunctionFamilyProvider() { } @Override public IBinaryHashFunctionFamily getBinaryHashFunctionFamily(Object type) throws AlgebricksException { // AMurmurHash3BinaryHashFunctionFamily converts numeric type to double type before doing hash() return AMurmurHash3BinaryHashFunctionFamily.INSTANCE; } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)