eugen yushin created DRILL-4666:
-----------------------------------
Summary: Pushdown doesn't apply for HBase with substr(key) from UT
Key: DRILL-4666
URL: https://issues.apache.org/jira/browse/DRILL-4666
Project: Apache Drill
Issue Type: Bug
Affects Versions: 1.6.0
Reporter: eugen yushin
Following
[example|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java]
section, running query from {{testFilterPushDownCompositeBigIntRowKey1()}}
results in following execution plan:
{code}
EXPLAIN PLAN FOR
SELECT
CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'bigint_be') d
,CONVERT_FROM(BYTE_SUBSTR(row_key, 9, 8), 'bigint_be') id
,CONVERT_FROM(tableName.f.c, 'UTF8')
FROM hbase.`TestTableCompositeDate` tableName
WHERE
CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'bigint_be') = cast(1409040000000
as bigint)
;
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 Project(d=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($0, 1, 8))],
id=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($0, 9, 8))],
EXPR$2=[CONVERT_FROMUTF8(ITEM($1, 'c'))])
00-02 SelectionVectorRemover
00-03 Filter(condition=[=(CONVERT_FROM(BYTE_SUBSTR($0, 1, 8),
'bigint_be'), 1409040000000)])
00-04 Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec
[tableName=TestTableCompositeDate, startRow=null, stopRow=null, filter=null],
columns=[`*`]]])
{code}
>From the above, Drill uses full scan and then filters out rows by key
>substring started from 1st position.
This query executes pretty fast in test dataset provided in repo, but
performance dramatically decreases with real use cases.
I've used
_contrib\storage-hbase\src\test\java\org\apache\drill\hbase\TestTableGenerator.java_
to populate test table.
Moreover, [TestHBaseFilterPushDown|TestHBaseFilterPushDown.java] uses
[runHBaseSQLVerifyCount|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java]
to pass the tests. It checks result set count, and not execution plan.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)