HI All:
Through implementing the JPPD feature (
https://issues.apache.org/jira/browse/DRILL-6385) , I was blocked by the
problem: how to get the hash code of each build side of the hash join
columns through the dynamic generated java code. Hope someone can give some
advice.
I supposed to add methods as below to the HashTableTemplate :
public long getBuild64HashCode(int incomingRowIdx, int seedValue, int
fieldId) throws SchemaChangeException{
return getBuild64HashCodeInner(incomingRowIdx, seedValue, fieldId);
}
protected abstract long
getBuild64HashCodeInner(@Named("incomingRowIdx") int incomingRowIdx,
@Named("seedValue") int seedValue, @Named("fieldId") int fieldId)
throws SchemaChangeException;
The high level code to invoke the getBuild64HashCode method is at the
HashJoinBatch's executeBuildPhase() :
//create runtime filter
if (cycleNum == 0 && enableRuntimeFilter) {
//create runtime filter and send out async
int condFieldIndex = 0;
for (BloomFilter bloomFilter : bloomFilters) {
//VV
for (int ind = 0; ind < currentRecordCount; ind++) {
long hashCode = partitions[0].getBuild64HashCode(ind, condFieldIndex);
bloomFilter.insert(hashCode);
}
condFieldIndex++;
}
//TODO sered out async
}
As you know, the abstract method getBuild64HashCodeInner needs to
calculate the hash codes of each build side column by the fieldId input
parameter. In order to achieve this target, I plan to have different
solving parts corresponding to different column ValueVector , using the if
statement to distinguish different solving parts through the id of the
column. The corresponding method to generate the dynamic codes is as
below:
private void setupGetBuild64Hash(ClassGenerator<HashTable> cg,
MappingSet incomingMapping, VectorAccessible batch,
LogicalExpression[] keyExprs, TypedFieldId[] buildKeyFieldIds)
throws SchemaChangeException {
cg.setMappingSet(incomingMapping);
if (keyExprs == null || keyExprs.length == 0) {
cg.getEvalBlock()._return(JExpr.lit(0));
}
String seedValue = "seedValue";
String fieldId = "fieldId";
LogicalExpression seed =
ValueExpressions.getParameterExpression(seedValue,
Types.required(TypeProtos.MinorType.INT));
LogicalExpression fieldIdParamExpr =
ValueExpressions.getParameterExpression(fieldId,
Types.required(TypeProtos.MinorType.INT) );
HoldingContainer fieldIdParamHolder = cg.addExpr(fieldIdParamExpr);
int i = 0;
for (LogicalExpression expr : keyExprs) {
TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
ValueExpressions.IntExpression targetBuildFieldIdExp = new
ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0],
ExpressionPosition.UNKNOWN);
JFieldRef targetBuildSideFieldId =
cg.addExpr(targetBuildFieldIdExp,
ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
JBlock ifBlock =
cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targetBuildSideFieldId))._then();
LogicalExpression hashExpression =
HashPrelUtil.getHashExpression(expr, seed, incomingProbe != null);
LogicalExpression materializedExpr =
ExpressionTreeMaterializer.materializeAndCheckErrors(hashExpression,
batch, context.getFunctionRegistry());
HoldingContainer hash = cg.addExpr(materializedExpr,
ClassGenerator.BlkCreateMode.FALSE);
ifBlock._return(hash.getValue());
i++;
}
cg.getEvalBlock()._return(JExpr.lit(0));
}
But unfortunately, the generated codes are not what I expected. The codes
to read ValueVector , calculate hash code of the read value do not stay in
the if block. So how can I let the related codes stay in the if block ?