GitHub user eowhadi opened a pull request:

    https://github.com/apache/incubator-trafodion/pull/255

    [TRAFODION-1662] Predicate push down revisited (V2)

    Currently Trafodion predicate push down to hbase is supporting only the 
following cases:
    <Column><op><Value> AND <Column> <op><value> AND…
    And require columns to be “SERIALIZED” (can be compared using binary 
comparator), 
    and value data type is not a superset of column data type.
    and char type is not case insensitive or upshifted
    and no support for Big Numbers
    It suffer from several issues:
    Handling of nullable column:
    When a nullable column is involved in the predicate, because of the way 
nulls are handled in trafodion (can ether be missing cell, or cell with first 
byte set to xFF), binary compare cannot do a good job at semantically treating 
NULL the way a SQL expression would require. So the current behavior is that 
all null column values as never filtered out and always returned, letting 
trafodion perform a second pass predicate evaluation to deal with nulls. This 
can quickly turn counterproductive for very sparse columns, as we would perform 
useless filtering at region server side (since all nulls are pass), and 
optimizer has not been coded to turn off the feature on sparse columns.
    In addition, since null handling is done on trafodion side, the current 
code artificially pull all key columns to make sure that a null coded as absent 
cell is correctly pushed up for evaluation at trafodion layer. This can be 
optimized by only requiring a single non-nullable column on current code, but 
this is another story… as you will see bellow, the proposed new way of doing 
pushdown will handle 100% nulls at hbase layer, therefore requiring adding non 
nullable column only when a nullable column is needed in the select statement 
(not in the predicate).
    Always returning predicate columns
    Select a from t where b>10 would always return the b column to trafodion, 
even if b is non nullable. This is not necessary and will result in useless 
network and cpu consumption, even if the predicate is not re-evaluated.
    The new advanced predicate push down feature will do the following:
    Support any of these primitives:
    <col><op><value>
    <col><op><col>      (nice to have, high cost of custom filter low value 
after TPC-DS query survey) 
    Is null
    Is not null
    Like        -> to be investigated, not yet covered in this document
    And combination of these primitive with arbitrary number of OR and AND with 
( ) associations, given that within () there is only ether any number of OR or 
any number of AND, no mixing OR and AND inside (). I suspect that normalizer 
will always convert expression so that this mixing never happen…
    And will remove the 2 shortcoming of previous implementation: all null 
cases will be handled at hbase layer, never requiring re-doing evaluation and 
the associated pushing up of null columns, and predicate columns will not be 
pushed up if not needed by the node for other task than the predicate 
evaluation.
    Note that BETWEEN and IN predicate, when normalized as one of the form 
supported above, will be pushed down too. Nothing in the code will need to be 
done to support this.
    Improvement of explain:
    We currently do not show predicate push down information in the scan node. 
2 key information is needed:
    Is predicate push down used
    What columns are retrieved by the scan node (investigate why we get column 
all instead of accurate information)
    The first one is obviously used to determine if all the conditions are met 
to have push down available, and the second is used to make sure we are not 
pushing up data from columns we don’t need.
    Note that columns info is inconsistently shown today. Need to fix this.
    Enablement using an existing ON/OFF CQD (HBASE_FILTER_PREDS) that will be 
replaced with a multi value CQD that will enable various level of push down 
optimization, like we have on PCODE optimization level.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/eowhadi/incubator-trafodion 
predicatePushdownV2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-trafodion/pull/255.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #255
    
----
commit 1c5f243f7c79e9ceb4a008099e60641d90515037
Author: Eric Owhadi <[email protected]>
Date:   2016-01-07T01:25:54Z

    First commit for advanced predicate pushdown feature (also known as 
pushdown V2)
    associated JIRA TRAFODION-1662 Predicate push down revisited (V2). The JIRA 
contains a blueprint document, useful to understand what the code is supposed 
to do.
    This code is enabled using CQD hbase_filter_preds '2', and bypassed 
otherwise. Except for the change implemented in ValueDesc.cpp that is a global 
bug fix whereValueIdSet are supposed to contain set of valueID ANDed together, 
and should not contain any ValueID with operator ITM_AND.

commit 8a6f2205c630ff6599eacca247bc4fbe508aa136
Author: Eric Owhadi <[email protected]>
Date:   2016-01-07T01:34:06Z

    Merge branch 'master' of github.com:apache/incubator-trafodion into 
predicatePushdownV2

commit 38573bff44e90a4b6bb82d03af8d83631a6e38bb
Author: Eric Owhadi <[email protected]>
Date:   2016-01-08T01:59:05Z

    Merge branch 'master' of github.com:apache/incubator-trafodion into 
predicatePushdownV2

commit 90795250785f50cc0538284f22b1b8589a84734a
Author: Eric Owhadi <[email protected]>
Date:   2016-01-08T15:05:38Z

    Fix issue where optimization on key column addition should be turned off 
for MDAM scans, and update EXPECTEDTESTRTS to showcase the new value expected 
as byte read showing a 53% improvement over previous code

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to