GitHub user eowhadi opened a pull request:
https://github.com/apache/incubator-trafodion/pull/255
[TRAFODION-1662] Predicate push down revisited (V2)
Currently Trafodion predicate push down to hbase is supporting only the
following cases:
<Column><op><Value> AND <Column> <op><value> ANDâ¦
And require columns to be âSERIALIZEDâ (can be compared using binary
comparator),
and value data type is not a superset of column data type.
and char type is not case insensitive or upshifted
and no support for Big Numbers
It suffer from several issues:
Handling of nullable column:
When a nullable column is involved in the predicate, because of the way
nulls are handled in trafodion (can ether be missing cell, or cell with first
byte set to xFF), binary compare cannot do a good job at semantically treating
NULL the way a SQL expression would require. So the current behavior is that
all null column values as never filtered out and always returned, letting
trafodion perform a second pass predicate evaluation to deal with nulls. This
can quickly turn counterproductive for very sparse columns, as we would perform
useless filtering at region server side (since all nulls are pass), and
optimizer has not been coded to turn off the feature on sparse columns.
In addition, since null handling is done on trafodion side, the current
code artificially pull all key columns to make sure that a null coded as absent
cell is correctly pushed up for evaluation at trafodion layer. This can be
optimized by only requiring a single non-nullable column on current code, but
this is another story⦠as you will see bellow, the proposed new way of doing
pushdown will handle 100% nulls at hbase layer, therefore requiring adding non
nullable column only when a nullable column is needed in the select statement
(not in the predicate).
Always returning predicate columns
Select a from t where b>10 would always return the b column to trafodion,
even if b is non nullable. This is not necessary and will result in useless
network and cpu consumption, even if the predicate is not re-evaluated.
The new advanced predicate push down feature will do the following:
Support any of these primitives:
<col><op><value>
<col><op><col> (nice to have, high cost of custom filter low value
after TPC-DS query survey)
Is null
Is not null
Like -> to be investigated, not yet covered in this document
And combination of these primitive with arbitrary number of OR and AND with
( ) associations, given that within () there is only ether any number of OR or
any number of AND, no mixing OR and AND inside (). I suspect that normalizer
will always convert expression so that this mixing never happenâ¦
And will remove the 2 shortcoming of previous implementation: all null
cases will be handled at hbase layer, never requiring re-doing evaluation and
the associated pushing up of null columns, and predicate columns will not be
pushed up if not needed by the node for other task than the predicate
evaluation.
Note that BETWEEN and IN predicate, when normalized as one of the form
supported above, will be pushed down too. Nothing in the code will need to be
done to support this.
Improvement of explain:
We currently do not show predicate push down information in the scan node.
2 key information is needed:
Is predicate push down used
What columns are retrieved by the scan node (investigate why we get column
all instead of accurate information)
The first one is obviously used to determine if all the conditions are met
to have push down available, and the second is used to make sure we are not
pushing up data from columns we donât need.
Note that columns info is inconsistently shown today. Need to fix this.
Enablement using an existing ON/OFF CQD (HBASE_FILTER_PREDS) that will be
replaced with a multi value CQD that will enable various level of push down
optimization, like we have on PCODE optimization level.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/eowhadi/incubator-trafodion
predicatePushdownV2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-trafodion/pull/255.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #255
----
commit 1c5f243f7c79e9ceb4a008099e60641d90515037
Author: Eric Owhadi <[email protected]>
Date: 2016-01-07T01:25:54Z
First commit for advanced predicate pushdown feature (also known as
pushdown V2)
associated JIRA TRAFODION-1662 Predicate push down revisited (V2). The JIRA
contains a blueprint document, useful to understand what the code is supposed
to do.
This code is enabled using CQD hbase_filter_preds '2', and bypassed
otherwise. Except for the change implemented in ValueDesc.cpp that is a global
bug fix whereValueIdSet are supposed to contain set of valueID ANDed together,
and should not contain any ValueID with operator ITM_AND.
commit 8a6f2205c630ff6599eacca247bc4fbe508aa136
Author: Eric Owhadi <[email protected]>
Date: 2016-01-07T01:34:06Z
Merge branch 'master' of github.com:apache/incubator-trafodion into
predicatePushdownV2
commit 38573bff44e90a4b6bb82d03af8d83631a6e38bb
Author: Eric Owhadi <[email protected]>
Date: 2016-01-08T01:59:05Z
Merge branch 'master' of github.com:apache/incubator-trafodion into
predicatePushdownV2
commit 90795250785f50cc0538284f22b1b8589a84734a
Author: Eric Owhadi <[email protected]>
Date: 2016-01-08T15:05:38Z
Fix issue where optimization on key column addition should be turned off
for MDAM scans, and update EXPECTEDTESTRTS to showcase the new value expected
as byte read showing a 53% improvement over previous code
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---