[Impala-CR](cdh5-trunk) IMPALA-2328 Parquet scan should use min/max stats

Alex Behm (Code Review) Fri, 15 Jul 2016 17:16:54 -0700

Alex Behm has posted comments on this change.

Change subject: IMPALA-2328 Parquet scan should use min/max stats
......................................................................



Patch Set 1:

Thanks for posting your patch!

I have a few suggestions regarding the high-level approach that I'd like to see 
addressed before further reviewing/accepting this patch.

Imo, these are the steps for pruning row groups based on min/max:
1. In the Impala Frontend, analyze the predicates assigned to an HdfsScanNode 
and generate a list of applicable min predicates as well as max predicates that 
are going to be evaluated against a scan tuple.
2. Ship those lists of predicates to the BE for execution (need to change the 
corresponding thrift structs).
3. In the Backend, while doing a Parquet scan, create and materialize a min 
tuple based on the current row group and evaluate the list of min predicates. 
Then do the same for the max predicates. The row group is pruned if any of the 
min/max predicates return false.

I will leave a few more detailed comments in the code as to what I think are 
the right and non-so-right design choices.

Thanks for working on this!

-- 
To view, visit http://gerrit.cloudera.org:8080/3623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91de1f4d0fb2a982d06cd344e41901e3bf3c2cea
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Jian Wu <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Jian Wu <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-2328 Parquet scan should use min/max stats

Reply via email to