Alex Behm has posted comments on this change. Change subject: PREVIEW: Basic column-wise slot materialization in Parquet scanner. ......................................................................
Patch Set 1: (5 comments) Thanks, Tim! As discussed in person, let's stick to materializing in a row format for now because we need to output results in rows. It is not clear that materializing to columns then copying to rows is worth the extra complexity without also immediately reaping the column-wise benefits (e.g. SIMD predicate eval). The biggest benefit is expected from late materialization which is possible in both designs and probably easier in the current design. The new patch set contains several bugfixes. http://gerrit.cloudera.org:8080/#/c/2779/1/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 325: continue_execution = ReadValue(pool, tuple); > I think you could make these non-virtual calls without duplicating code by Good idea. As discussed I will give this a try. I'll make this change in its own patch set so one can compare the before/after easily without also looking at extra differences. Line 1732: // and return an output batch with relatively few rows. > The TODO describes the current intended behaviour, so that sounds right. I Yea, I think so, too. Do you think it's obvious enough for me to remove the TODO? Line 1737: // Optimization for scans with selective filters/conjuncts: None of the > Is this factoring in accumulated disk buffers? As you've correctly pointed out, it's not clear whether this optimization is even useful by itself. I removed this optimization until we have hard data that suggests that we need something like it. Line 1740: scratch_batch->pool->Clear(); > Are you seeing the MemPool show up in the profile? The current allocation a This was a "premature" optimization. I was just assuming that reusing the memory would be better. Removed for now pending further investigation. Line 1829: col_reader->ReadValueBatch(scratch_batch.pool.get(), batch_size, > Ignoring return value? Correct, that was a bug. Fixed. -- To view, visit http://gerrit.cloudera.org:8080/2779 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I72a613fa805c542e39df20588fb25c57b5f139aa Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Alex Behm <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
