Taewoo Kim created ASTERIXDB-1736:
-------------------------------------

             Summary: Grace Hash Join and Hybrid Hash Join are not being used.
                 Key: ASTERIXDB-1736
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1736
             Project: Apache AsterixDB
          Issue Type: Improvement
            Reporter: Taewoo Kim
            Assignee: Taewoo Kim


As the title says, Grace Hash Join and Hybrid Hash Join are not being used. I 
suggest that we remove these two join methods. Here are my findings for these 
two joins. 

1) Grace Hash Join
GraceHashJoinOperatorDescriptor is only called from two places: 
org.apache.hyracks.examples.tpch.client.join and TPCHCustomerOrderHashJoinTest.
One is a Hyracks example (tpch.client) and the other is a unit test. This join 
is not used currently (not chosen during the compilation).

2) Hybrid Hash Join
During the compilation, the optimizer decides whether it will use Hybrid Hash 
Join or Optimized Hybrid Hash Join. 
If the hash function family for each key variable is set, then we use the 
optimized hybrid hash join. 
If not, we use the hybrid hash join. However, in fact, this path - hybrid hash 
join path will never be chosen. Let's check the code. 

{code:title=HybridHashJoinPOperator.java|borderStyle=solid}     
        IBinaryHashFunctionFamily[] hashFunFamilies = 
JobGenHelper.variablesToBinaryHashFunctionFamilies(keysLeftBranch,
                env, context);
                
        ...
        
        boolean optimizedHashJoin = true;
        for (IBinaryHashFunctionFamily family : hashFunFamilies) {
            if (family == null) {
                optimizedHashJoin = false;
                break;
            }
        }

        if (optimizedHashJoin) {
            opDesc = generateOptimizedHashJoinRuntime(context, inputSchemas, 
keysLeft, keysRight, hashFunFamilies,
                    comparatorFactories, predEvaluatorFactory, recDescriptor, 
spec);
        } else {
            opDesc = generateHashJoinRuntime(context, inputSchemas, keysLeft, 
keysRight, hashFunFactories,
                    comparatorFactories, predEvaluatorFactory, recDescriptor, 
spec);
        }
{code}
        
As we can see, optimizedHashJoin is set to false only when the hash family is 
null. 
Then, how do we assign the hashfamily for each key variable?            

{code:title=JobGenHelper.java|borderStyle=solid}
    public static IBinaryHashFunctionFamily[] 
variablesToBinaryHashFunctionFamilies(
            Collection<LogicalVariable> varLogical, IVariableTypeEnvironment 
env, JobGenContext context)
                    throws AlgebricksException {
        IBinaryHashFunctionFamily[] funFamilies = new 
IBinaryHashFunctionFamily[varLogical.size()];
        int i = 0;
        IBinaryHashFunctionFamilyProvider bhffProvider = 
context.getBinaryHashFunctionFamilyProvider();
        for (LogicalVariable var : varLogical) {
            Object type = env.getVarType(var);
            funFamilies[i++] = bhffProvider.getBinaryHashFunctionFamily(type);
        }
        return funFamilies;
    }
{code}

For each variable type, we try to get hash function family. In the current 
codebase, AqlBinaryHashFunctionFamilyProvider is the only class that implements 
IBinaryHashFunctionFamilyProvider.
And for any type, it returns AMurmurHash3BinaryHashFunctionFamily. 
So, there is no way that the hash function family is null.

{code:title= AqlBinaryHashFunctionFamilyProvider.java|borderStyle=solid}
public class AqlBinaryHashFunctionFamilyProvider implements 
IBinaryHashFunctionFamilyProvider, Serializable {

    private static final long serialVersionUID = 1L;
    public static final AqlBinaryHashFunctionFamilyProvider INSTANCE = new 
AqlBinaryHashFunctionFamilyProvider();

    private AqlBinaryHashFunctionFamilyProvider() {

    }

    @Override
    public IBinaryHashFunctionFamily getBinaryHashFunctionFamily(Object type) 
throws AlgebricksException {
        // AMurmurHash3BinaryHashFunctionFamily converts numeric type to double 
type before doing hash()
        return AMurmurHash3BinaryHashFunctionFamily.INSTANCE;
    }

}
{code}


 
    



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to