Jian Wu has posted comments on this change. Change subject: IMPALA-2328 Parquet scan should use min/max stats ......................................................................
Patch Set 1: > Thanks for posting your patch! > > I have a few suggestions regarding the high-level approach that I'd > like to see addressed before further reviewing/accepting this > patch. > > Imo, these are the steps for pruning row groups based on min/max: > 1. In the Impala Frontend, analyze the predicates assigned to an > HdfsScanNode and generate a list of applicable min predicates as > well as max predicates that are going to be evaluated against a > scan tuple. > 2. Ship those lists of predicates to the BE for execution (need to > change the corresponding thrift structs). > 3. In the Backend, while doing a Parquet scan, create and > materialize a min tuple based on the current row group and evaluate > the list of min predicates. Then do the same for the max > predicates. The row group is pruned if any of the min/max > predicates return false. > > I will leave a few more detailed comments in the code as to what I > think are the right and non-so-right design choices. > > Thanks for working on this! Thanks for your comments. Actually I have thought the approach you suggested. The problem is if the predicate is between two columns, such as id_1 > id_2. Separated min tuple and max tuple won't help, since we can't judge whether the intersection of the two ranges is empty just according to the min tuple or the max tuple. How do you think? -- To view, visit http://gerrit.cloudera.org:8080/3623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I91de1f4d0fb2a982d06cd344e41901e3bf3c2cea Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Jian Wu <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Jian Wu <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: No
