Alex Behm has posted comments on this change.

Change subject: IMPALA-2956: Filters should be able to target multiple scan 
nodes
......................................................................


Patch Set 2:

(19 comments)

I'll need to take a closer look at the BE changes since I'm not very familiar 
with them.

http://gerrit.cloudera.org:8080/#/c/2932/2/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

Line 148:     const auto& id = tnode.node_id;
seems simple to just inline below


http://gerrit.cloudera.org:8080/#/c/2932/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 512:         int target_ndx = 
filter.planid_to_target_ndx.at(plan_node.node_id);
avoid at() because it can throw


http://gerrit.cloudera.org:8080/#/c/2932/2/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

Line 368:     std::vector<TPlanNodeId> targets;
TRuntimeFilterTargetDesc?


http://gerrit.cloudera.org:8080/#/c/2932/2/fe/src/main/java/com/cloudera/impala/analysis/Analyzer.java
File fe/src/main/java/com/cloudera/impala/analysis/Analyzer.java:

Line 1994:   public List<SlotId> getSlotsWithValueTransfer(SlotId slotId) {
other naming ideas to consider:
getValueTransferTargets(), getValueTransferDestinations()?, 
getDstValueTransfers(), get 

slotId -> srcSid


Line 1999:       if (targetSid.asInt() >= maxSlotId || slotId.asInt() >= 
maxSlotId) continue;
you can check srcSlotId outside of the loop


Line 2514:     private int numRegisteredSlots_ = 
globalState_.descTbl.getMaxSlotId().asInt() + 1;
final

how about just numSlots_ or maxSlotId_ because all slots must be registered, so 
the "registered" part doesn't really add anything


http://gerrit.cloudera.org:8080/#/c/2932/2/fe/src/main/java/com/cloudera/impala/planner/PlanNode.java
File fe/src/main/java/com/cloudera/impala/planner/PlanNode.java:

Line 633:         for (int i = 0; i < filter.getTargets().size(); ++i) {
consider adding this as a helper in RuntimeFilter, something like

Expr getTargetExpr(PlanNodeId targetPlanNodeId)


http://gerrit.cloudera.org:8080/#/c/2932/2/fe/src/main/java/com/cloudera/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/com/cloudera/impala/planner/RuntimeFilterGenerator.java:

Line 98:     private final List<ScanNode> targets_ = Lists.newArrayList();
similar suggestion as with the thrift structs: might make sense to package the 
stuff of a target into a class. right now there are 4 separate lists


Line 105:     // Slots from base table tuples that are in the same equivalent 
classes as the slots
remove references to "equivalent" and "equivalence class"


Line 130:     private boolean hasRemoteTargets_ = false;
This is not true if the filter is produced by a partitioned join?

Imo, it's simpler to explain without relying on join types. It also gets 
confusing with num_nodes=1 where none of the distributed join types apply. Also 
if you look in computeHasLocalTargets() there is no mention or check on the 
join type.


Line 183:         appliedOnPartitionColumns =
logic seems wrong because appliedOnPartitionColumns starts out as true (i.e. 
appliedOnPartitionColumns will always be true)


Line 271:       for (SlotId equivSlotId: 
analyzer.getSlotsWithValueTransfer(slotId)) {
targetSlotId or targetSid


Line 468:    * Registers a runtime filter with a specific tuple id.
... with a specific target tuple id.


Line 470:   private void registerRuntimeFilter(RuntimeFilter filter, TupleId 
tupleId) {
tupleId -> targetTid


Line 486:   private void unregisterRuntimeFilter(RuntimeFilter runtimeFilter) {
In this new multi-target world the concept of 'registered' is a little 
confusing because there may RuntimeFilters in an unregistered state that are 
still contained in the runtimeFiltersByTid_ with one or more valid targets.

What do you think of a "finalized" or "complete" concept instead of 
"unregistration". At some point a RuntimeFilter is finalized/complete meaning 
that no more targets can be added. A RuntimeFilter may be final even if it has 
no targets. That seems to capture the intent a little more cleanly.


Line 489:       targetTupleIds.add(targetNode.getTupleIds().get(0));
targetTupleIds.addAll(targetNode.getTupleIds());


Line 503:    * Currently, runtime filter can only be assigned to HdfsScanNodes.
runtime filters


Line 519:    * the scan node with tuple descriptor 'tid'.
with target tuple id


Line 521:   private Expr computeTargetExpr(RuntimeFilter filter, TupleId tid, 
Analyzer analyzer) {
tid -> targetTid


-- 
To view, visit http://gerrit.cloudera.org:8080/2932
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iad2ce4e579a30616c469312a4e658140d317507b
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-HasComments: Yes

Reply via email to