+1 On Sat, Nov 19, 2016 at 12:32 PM, Mike Carey <[email protected]> wrote:
> +1 for removal of now-defunct operations! > > > > On 11/19/16 12:01 PM, Taewoo Kim wrote: > >> Hi all, >> >> Please share your thought on this issue. In short, Grace Hash Join and >> Hybrid Hash Join are not being used. We only use Optimized Hybrid Hash >> Join. Therefore, I think it would be better to remove them. >> https://issues.apache.org/jira/browse/ASTERIXDB-1736 >> <https://issues.apache.org/jira/browse/ASTERIXDB-1736> >> ---------- Forwarded message ---------- >> From: Taewoo Kim (JIRA) <[email protected]> >> Date: Fri, Nov 18, 2016 at 5:06 PM >> Subject: [jira] [Created] (ASTERIXDB-1736) Grace Hash Join and Hybrid Hash >> Join are not being used. >> To: [email protected] >> >> >> Taewoo Kim created ASTERIXDB-1736: >> ------------------------------------- >> >> Summary: Grace Hash Join and Hybrid Hash Join are not being >> used. >> Key: ASTERIXDB-1736 >> URL: https://issues.apache.org/jira >> /browse/ASTERIXDB-1736 >> Project: Apache AsterixDB >> Issue Type: Improvement >> Reporter: Taewoo Kim >> Assignee: Taewoo Kim >> >> >> As the title says, Grace Hash Join and Hybrid Hash Join are not being >> used. >> I suggest that we remove these two join methods. Here are my findings for >> these two joins. >> >> 1) Grace Hash Join >> GraceHashJoinOperatorDescriptor is only called from two places: >> org.apache.hyracks.examples.tpch.client.join and >> TPCHCustomerOrderHashJoinTest. >> One is a Hyracks example (tpch.client) and the other is a unit test. This >> join is not used currently (not chosen during the compilation). >> >> 2) Hybrid Hash Join >> During the compilation, the optimizer decides whether it will use Hybrid >> Hash Join or Optimized Hybrid Hash Join. >> If the hash function family for each key variable is set, then we use the >> optimized hybrid hash join. >> If not, we use the hybrid hash join. However, in fact, this path - hybrid >> hash join path will never be chosen. Let's check the code. >> >> {code:title=HybridHashJoinPOperator.java|borderStyle=solid} >> IBinaryHashFunctionFamily[] hashFunFamilies = JobGenHelper. >> variablesToBinaryHashFunctionFamilies(keysLeftBranch, >> env, context); >> >> ... >> >> boolean optimizedHashJoin = true; >> for (IBinaryHashFunctionFamily family : hashFunFamilies) { >> if (family == null) { >> optimizedHashJoin = false; >> break; >> } >> } >> >> if (optimizedHashJoin) { >> opDesc = generateOptimizedHashJoinRuntime(context, >> inputSchemas, keysLeft, keysRight, hashFunFamilies, >> comparatorFactories, predEvaluatorFactory, >> recDescriptor, spec); >> } else { >> opDesc = generateHashJoinRuntime(context, inputSchemas, >> keysLeft, keysRight, hashFunFactories, >> comparatorFactories, predEvaluatorFactory, >> recDescriptor, spec); >> } >> {code} >> >> As we can see, optimizedHashJoin is set to false only when the hash family >> is null. >> Then, how do we assign the hashfamily for each key variable? >> >> {code:title=JobGenHelper.java|borderStyle=solid} >> public static IBinaryHashFunctionFamily[] >> variablesToBinaryHashFunctionF >> amilies( >> Collection<LogicalVariable> varLogical, >> IVariableTypeEnvironment env, JobGenContext context) >> throws AlgebricksException { >> IBinaryHashFunctionFamily[] funFamilies = new >> IBinaryHashFunctionFamily[varLogical.size()]; >> int i = 0; >> IBinaryHashFunctionFamilyProvider bhffProvider = context. >> getBinaryHashFunctionFamilyProvider(); >> for (LogicalVariable var : varLogical) { >> Object type = env.getVarType(var); >> funFamilies[i++] = bhffProvider.getBinaryHashFunctionFamily( >> type); >> } >> return funFamilies; >> } >> {code} >> >> For each variable type, we try to get hash function family. In the current >> codebase, AqlBinaryHashFunctionFamilyProvider is the only class that >> implements IBinaryHashFunctionFamilyProvider. >> And for any type, it returns AMurmurHash3BinaryHashFunctionFamily. >> So, there is no way that the hash function family is null. >> >> {code:title= AqlBinaryHashFunctionFamilyProvider.java|borderStyle=solid} >> public class AqlBinaryHashFunctionFamilyProvider implements >> IBinaryHashFunctionFamilyProvider, Serializable { >> >> private static final long serialVersionUID = 1L; >> public static final AqlBinaryHashFunctionFamilyProvider INSTANCE = >> new >> AqlBinaryHashFunctionFamilyProvider(); >> >> private AqlBinaryHashFunctionFamilyProvider() { >> >> } >> >> @Override >> public IBinaryHashFunctionFamily getBinaryHashFunctionFamily(Object >> type) throws AlgebricksException { >> // AMurmurHash3BinaryHashFunctionFamily converts numeric type to >> double type before doing hash() >> return AMurmurHash3BinaryHashFunctionFamily.INSTANCE; >> } >> >> } >> {code} >> >> >> >> >> >> >> >> -- >> This message was sent by Atlassian JIRA >> (v6.3.4#6332) >> >> >
