Tim Armstrong has posted comments on this change. Change subject: IMPALA-3629: Codegen EvalConjuncts in hdfs-parquet-scanner ......................................................................
Patch Set 2: (8 comments) Code changes look good - I think we can make a few easy improvements that will help with understanding why codegen is disabled, and also improve perf further. Have you done any benchmarking for the change? http://gerrit.cloudera.org:8080/#/c/3774/2//COMMIT_MSG Commit Message: Line 7: IMPALA-3629: Codegen EvalConjuncts in hdfs-parquet-scanner We're really codegening TransferScratchTuples() if we make the additional changes I suggested so you could update this. PS2, Line 12: EvalConjunct nit: EvalConjuncts http://gerrit.cloudera.org:8080/#/c/3774/2/be/src/exec/hdfs-parquet-scanner-ir.cc File be/src/exec/hdfs-parquet-scanner-ir.cc: PS2, Line 47: tuple_size We know the value of tuple_size at prepare time so can optimise out this branch and a bunch of dead code. How about we optimise this further and make tuple_size an argument to TransferScratchTuples()? We can then replace it with a constant. E.g. see what we do for build_filters_arg in partitioned-hash-join-node. We could also do the same for has_filters. PS2, Line 72: has_conjuncts This has_conjuncts check was for perf to avoid calling into EvalConjuncts(). Now thatt his is codegen'd we can remove the check. Line 86: if (scratch_batch_->AtEnd()) { It would be better to avoid cross-compiling this resource transfer code. I took a quick look and I think it makes sense to move it to the callsite of TransferScratchTuples(). The general principle is that we should only codegen perf-critical stuff, since there's a cost associated with it. http://gerrit.cloudera.org:8080/#/c/3774/2/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 386: DCHECK(codegen != NULL); Swap this with the SCOPED_TIMER above - this order doesn't make sense. Line 389: codegen->GetFunction(IRFunction::TRANSFER_SCRATCH_TUPLES, true); Indent 4 spaces. Line 393: if (!ExecNode::CodegenEvalConjuncts(node->runtime_state(), conjunct_ctxs, We've been trying to move towards logging more info about why codegen failed, e.g. the logic I pasted below from partitioned-aggregation-node. It would be good if we could do something similar here to make this return a non-ok Status instead of the NULL pointer if codegen fails. bool codegen_enabled = false; Status codegen_status; if (state->codegen_enabled()) { codegen_status = is_streaming_preagg_ ? CodegenProcessBatchStreaming() : CodegenProcessBatch(); codegen_enabled = codegen_status.ok(); } AddCodegenExecOption(codegen_enabled, codegen_status); return Status::OK(); -- To view, visit http://gerrit.cloudera.org:8080/3774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic327e437c7cd2b3f92cdb11c1e907bfee2d44ee8 Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
