Alex Behm has posted comments on this change. Change subject: IMPALA-2736: Basic column-wise slot materialization in Parquet scanner. ......................................................................
Patch Set 3: (8 comments) http://gerrit.cloudera.org:8080/#/c/2779/3/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 195: RowBatch batch; > The embedded RowBatch is a little funky but this seems like the best soluti Right. Line 207: //~ScratchTupleBatch() { pool->FreeAll(); } > ? Removed. Line 1729: int HdfsParquetScanner::TransferScratchTuples(ScratchTupleBatch* scratch_batch) { > Consider adding __restrict__ annotation. I suspect all of the member variab Thanks for the suggestions, Tim. I rewrote this function to be more performant (but possibly less readable). I ran a simple experiment to see the difference between the "basic" loop in this patch set and the optimized one in the new patch set: (I hacked the scan node to not output any rows) select l_linenumber from huge_lineitem Before: 1.5s After 1.3s This is averaged over 10 runs on release with a single impalad and scanner thread, after running a dummy query to load the table metadata. Looks like a decent perf improvement, but it should be weighed against the loss in readability (which is not super bad, imo). I don't want to linger too long on micro-optimizations just yet, since there are so many macro-optimizations left that will give us more bang for the buck (in a dev time spent vs perf benefit sense) Line 1738: output_row->SetTuple(scan_node_->tuple_idx(), scratch_tuple); > If this loop is performance-sensitive, consider hoisting this pointer indir Rewrote this function. See comment above. Line 1742: if (!scanner_conjunct_ctxs_->empty() && !ExecNode::EvalConjuncts( > Not your change, but we should consider just copying the scanner_conjunct_c I made some local changes here to follow your suggestions, but did not copy the conjuncts_ctxs into the scanner. Line 1823: int num_row_to_commit = TransferScratchTuples(scratch_batch); > There's an assumption here that the remaining scratch rows will fit in the Done http://gerrit.cloudera.org:8080/#/c/2779/3/be/src/util/rle-encoding.h File be/src/util/rle-encoding.h: Line 249: // significantly better than UNLIKELY(literal_count_ == 0 && repeat_count_ == 0) > Hopefully as this progresses we might be able to use a batched interface th Correct. I'm already working on batch-reading and caching the def/rep levels. Line 250: if (repeat_count_ == 0) { > Would (repeat_count_ + literal_count_) == 0 be better still? Not sure if th Might be better, no idea. Will try it. -- To view, visit http://gerrit.cloudera.org:8080/2779 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I72a613fa805c542e39df20588fb25c57b5f139aa Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Alex Behm <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Skye Wanderman-Milne <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
