Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3629: Codegen EvalConjuncts in hdfs-parquet-scanner
......................................................................


Patch Set 2:

(8 comments)

Code changes look good - I think we can make a few easy improvements that will 
help with understanding why codegen is disabled, and also improve perf further.

Have you done any benchmarking for the change?

http://gerrit.cloudera.org:8080/#/c/3774/2//COMMIT_MSG
Commit Message:

Line 7: IMPALA-3629: Codegen EvalConjuncts in hdfs-parquet-scanner
We're really codegening TransferScratchTuples() if we make the additional 
changes I suggested so you could update this.


PS2, Line 12: EvalConjunct
nit: EvalConjuncts


http://gerrit.cloudera.org:8080/#/c/3774/2/be/src/exec/hdfs-parquet-scanner-ir.cc
File be/src/exec/hdfs-parquet-scanner-ir.cc:

PS2, Line 47: tuple_size
We know the value of tuple_size at prepare time so can optimise out this branch 
and a bunch of dead code. How about we optimise this further and make 
tuple_size an argument to TransferScratchTuples()? We can then replace it with 
a constant. E.g. see what we do for build_filters_arg in 
partitioned-hash-join-node.

We could also do the same for has_filters.


PS2, Line 72: has_conjuncts 
This has_conjuncts check was for perf to avoid calling into EvalConjuncts(). 
Now thatt his is codegen'd we can remove the check.


Line 86:   if (scratch_batch_->AtEnd()) {
It would be better to avoid cross-compiling this resource transfer code. I took 
a quick look and I think it makes sense to move it to the callsite of 
TransferScratchTuples().

The general principle is that we should only codegen perf-critical stuff, since 
there's a cost associated with it.


http://gerrit.cloudera.org:8080/#/c/3774/2/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 386:   DCHECK(codegen != NULL);
Swap this with the SCOPED_TIMER above - this order doesn't make sense.


Line 389:     codegen->GetFunction(IRFunction::TRANSFER_SCRATCH_TUPLES, true);
Indent 4 spaces.


Line 393:   if (!ExecNode::CodegenEvalConjuncts(node->runtime_state(), 
conjunct_ctxs,
We've been trying to move towards logging more info about why codegen failed, 
e.g. the logic I pasted below from partitioned-aggregation-node. It would be 
good if we could do something similar here to make this return a non-ok Status 
instead of the NULL pointer if codegen fails.

  bool codegen_enabled = false;
  Status codegen_status;
  if (state->codegen_enabled()) {
    codegen_status = is_streaming_preagg_ ? CodegenProcessBatchStreaming()
                                          : CodegenProcessBatch();
    codegen_enabled = codegen_status.ok();
  }
  AddCodegenExecOption(codegen_enabled, codegen_status);
  return Status::OK();


-- 
To view, visit http://gerrit.cloudera.org:8080/3774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic327e437c7cd2b3f92cdb11c1e907bfee2d44ee8
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to