[
https://issues.apache.org/jira/browse/DRILL-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
eugen yushin updated DRILL-4666:
--------------------------------
Description:
Following
[example|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java]
section, running query from {{testFilterPushDownCompositeBigIntRowKey1()}}
results in following execution plan:
{code}
EXPLAIN PLAN FOR
SELECT
CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'bigint_be') d
,CONVERT_FROM(BYTE_SUBSTR(row_key, 9, 8), 'bigint_be') id
,CONVERT_FROM(tableName.f.c, 'UTF8')
FROM hbase.`TestTableCompositeDate` tableName
WHERE
CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'bigint_be') = cast(1409040000000
as bigint)
;
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 Project(d=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($0, 1, 8))],
id=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($0, 9, 8))],
EXPR$2=[CONVERT_FROMUTF8(ITEM($1, 'c'))])
00-02 SelectionVectorRemover
00-03 Filter(condition=[=(CONVERT_FROM(BYTE_SUBSTR($0, 1, 8),
'bigint_be'), 1409040000000)])
00-04 Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec
[tableName=TestTableCompositeDate, startRow=null, stopRow=null, filter=null],
columns=[`*`]]])
{code}
>From the above, Drill uses full scan and then filters out rows by key
>substring started from 1st position.
This query executes pretty fast in test dataset provided in repo, but
performance dramatically decreases with real use cases.
I've used
_contrib\storage-hbase\src\test\java\org\apache\drill\hbase\TestTableGenerator.java_
to populate test table.
Moreover,
[TestHBaseFilterPushDown|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java]
uses
[runHBaseSQLVerifyCount|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/BaseHBaseTest.java]
to pass the tests. It checks result set count, and not execution plan.
was:
Following
[example|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java]
section, running query from {{testFilterPushDownCompositeBigIntRowKey1()}}
results in following execution plan:
{code}
EXPLAIN PLAN FOR
SELECT
CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'bigint_be') d
,CONVERT_FROM(BYTE_SUBSTR(row_key, 9, 8), 'bigint_be') id
,CONVERT_FROM(tableName.f.c, 'UTF8')
FROM hbase.`TestTableCompositeDate` tableName
WHERE
CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'bigint_be') = cast(1409040000000
as bigint)
;
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 Project(d=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($0, 1, 8))],
id=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($0, 9, 8))],
EXPR$2=[CONVERT_FROMUTF8(ITEM($1, 'c'))])
00-02 SelectionVectorRemover
00-03 Filter(condition=[=(CONVERT_FROM(BYTE_SUBSTR($0, 1, 8),
'bigint_be'), 1409040000000)])
00-04 Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec
[tableName=TestTableCompositeDate, startRow=null, stopRow=null, filter=null],
columns=[`*`]]])
{code}
>From the above, Drill uses full scan and then filters out rows by key
>substring started from 1st position.
This query executes pretty fast in test dataset provided in repo, but
performance dramatically decreases with real use cases.
I've used
_contrib\storage-hbase\src\test\java\org\apache\drill\hbase\TestTableGenerator.java_
to populate test table.
Moreover, [TestHBaseFilterPushDown|TestHBaseFilterPushDown.java] uses
[runHBaseSQLVerifyCount|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java]
to pass the tests. It checks result set count, and not execution plan.
> Pushdown doesn't apply for HBase with substr(key) from UT
> ---------------------------------------------------------
>
> Key: DRILL-4666
> URL: https://issues.apache.org/jira/browse/DRILL-4666
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.6.0
> Reporter: eugen yushin
>
> Following
> [example|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java]
> section, running query from {{testFilterPushDownCompositeBigIntRowKey1()}}
> results in following execution plan:
> {code}
> EXPLAIN PLAN FOR
> SELECT
> CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'bigint_be') d
> ,CONVERT_FROM(BYTE_SUBSTR(row_key, 9, 8), 'bigint_be') id
> ,CONVERT_FROM(tableName.f.c, 'UTF8')
> FROM hbase.`TestTableCompositeDate` tableName
> WHERE
> CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'bigint_be') =
> cast(1409040000000 as bigint)
> ;
> +------+------+
> | text | json |
> +------+------+
> | 00-00 Screen
> 00-01 Project(d=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($0, 1, 8))],
> id=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($0, 9, 8))],
> EXPR$2=[CONVERT_FROMUTF8(ITEM($1, 'c'))])
> 00-02 SelectionVectorRemover
> 00-03 Filter(condition=[=(CONVERT_FROM(BYTE_SUBSTR($0, 1, 8),
> 'bigint_be'), 1409040000000)])
> 00-04 Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec
> [tableName=TestTableCompositeDate, startRow=null, stopRow=null, filter=null],
> columns=[`*`]]])
> {code}
> From the above, Drill uses full scan and then filters out rows by key
> substring started from 1st position.
> This query executes pretty fast in test dataset provided in repo, but
> performance dramatically decreases with real use cases.
> I've used
> _contrib\storage-hbase\src\test\java\org\apache\drill\hbase\TestTableGenerator.java_
> to populate test table.
> Moreover,
> [TestHBaseFilterPushDown|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java]
> uses
> [runHBaseSQLVerifyCount|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/BaseHBaseTest.java]
> to pass the tests. It checks result set count, and not execution plan.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)