Alex Behm has posted comments on this change. Change subject: IMPALA-2956: Filters should be able to target multiple scan nodes ......................................................................
Patch Set 2: (19 comments) I'll need to take a closer look at the BE changes since I'm not very familiar with them. http://gerrit.cloudera.org:8080/#/c/2932/2/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: Line 148: const auto& id = tnode.node_id; seems simple to just inline below http://gerrit.cloudera.org:8080/#/c/2932/2/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: Line 512: int target_ndx = filter.planid_to_target_ndx.at(plan_node.node_id); avoid at() because it can throw http://gerrit.cloudera.org:8080/#/c/2932/2/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: Line 368: std::vector<TPlanNodeId> targets; TRuntimeFilterTargetDesc? http://gerrit.cloudera.org:8080/#/c/2932/2/fe/src/main/java/com/cloudera/impala/analysis/Analyzer.java File fe/src/main/java/com/cloudera/impala/analysis/Analyzer.java: Line 1994: public List<SlotId> getSlotsWithValueTransfer(SlotId slotId) { other naming ideas to consider: getValueTransferTargets(), getValueTransferDestinations()?, getDstValueTransfers(), get slotId -> srcSid Line 1999: if (targetSid.asInt() >= maxSlotId || slotId.asInt() >= maxSlotId) continue; you can check srcSlotId outside of the loop Line 2514: private int numRegisteredSlots_ = globalState_.descTbl.getMaxSlotId().asInt() + 1; final how about just numSlots_ or maxSlotId_ because all slots must be registered, so the "registered" part doesn't really add anything http://gerrit.cloudera.org:8080/#/c/2932/2/fe/src/main/java/com/cloudera/impala/planner/PlanNode.java File fe/src/main/java/com/cloudera/impala/planner/PlanNode.java: Line 633: for (int i = 0; i < filter.getTargets().size(); ++i) { consider adding this as a helper in RuntimeFilter, something like Expr getTargetExpr(PlanNodeId targetPlanNodeId) http://gerrit.cloudera.org:8080/#/c/2932/2/fe/src/main/java/com/cloudera/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/com/cloudera/impala/planner/RuntimeFilterGenerator.java: Line 98: private final List<ScanNode> targets_ = Lists.newArrayList(); similar suggestion as with the thrift structs: might make sense to package the stuff of a target into a class. right now there are 4 separate lists Line 105: // Slots from base table tuples that are in the same equivalent classes as the slots remove references to "equivalent" and "equivalence class" Line 130: private boolean hasRemoteTargets_ = false; This is not true if the filter is produced by a partitioned join? Imo, it's simpler to explain without relying on join types. It also gets confusing with num_nodes=1 where none of the distributed join types apply. Also if you look in computeHasLocalTargets() there is no mention or check on the join type. Line 183: appliedOnPartitionColumns = logic seems wrong because appliedOnPartitionColumns starts out as true (i.e. appliedOnPartitionColumns will always be true) Line 271: for (SlotId equivSlotId: analyzer.getSlotsWithValueTransfer(slotId)) { targetSlotId or targetSid Line 468: * Registers a runtime filter with a specific tuple id. ... with a specific target tuple id. Line 470: private void registerRuntimeFilter(RuntimeFilter filter, TupleId tupleId) { tupleId -> targetTid Line 486: private void unregisterRuntimeFilter(RuntimeFilter runtimeFilter) { In this new multi-target world the concept of 'registered' is a little confusing because there may RuntimeFilters in an unregistered state that are still contained in the runtimeFiltersByTid_ with one or more valid targets. What do you think of a "finalized" or "complete" concept instead of "unregistration". At some point a RuntimeFilter is finalized/complete meaning that no more targets can be added. A RuntimeFilter may be final even if it has no targets. That seems to capture the intent a little more cleanly. Line 489: targetTupleIds.add(targetNode.getTupleIds().get(0)); targetTupleIds.addAll(targetNode.getTupleIds()); Line 503: * Currently, runtime filter can only be assigned to HdfsScanNodes. runtime filters Line 519: * the scan node with tuple descriptor 'tid'. with target tuple id Line 521: private Expr computeTargetExpr(RuntimeFilter filter, TupleId tid, Analyzer analyzer) { tid -> targetTid -- To view, visit http://gerrit.cloudera.org:8080/2932 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iad2ce4e579a30616c469312a4e658140d317507b Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-HasComments: Yes
