[ 
https://issues.apache.org/jira/browse/DRILL-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eugen yushin updated DRILL-4666:
--------------------------------
    Description: 
Following 
[example|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java]
 section, running query from {{testFilterPushDownCompositeBigIntRowKey1()}} 
results in following execution plan:
{code}
EXPLAIN PLAN FOR
SELECT
     CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'bigint_be') d
    ,CONVERT_FROM(BYTE_SUBSTR(row_key, 9, 8), 'bigint_be') id
    ,CONVERT_FROM(tableName.f.c, 'UTF8')
FROM hbase.`TestTableCompositeDate` tableName
WHERE
    CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'bigint_be') = cast(1409040000000 
as bigint)
;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(d=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($0, 1, 8))], 
id=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($0, 9, 8))], 
EXPR$2=[CONVERT_FROMUTF8(ITEM($1, 'c'))])
00-02        SelectionVectorRemover
00-03          Filter(condition=[=(CONVERT_FROM(BYTE_SUBSTR($0, 1, 8), 
'bigint_be'), 1409040000000)])
00-04            Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec 
[tableName=TestTableCompositeDate, startRow=null, stopRow=null, filter=null], 
columns=[`*`]]])
{code}

>From the above, Drill uses full scan and then filters out rows by key 
>substring started from 1st position.

This query executes pretty fast in test dataset provided in repo, but 
performance dramatically decreases with real use cases.

I've used 
_contrib\storage-hbase\src\test\java\org\apache\drill\hbase\TestTableGenerator.java_
 to populate test table.

Moreover, 
[TestHBaseFilterPushDown|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java]
 uses 
[runHBaseSQLVerifyCount|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/BaseHBaseTest.java]
 to pass the tests. It checks result set count, and not execution plan.

  was:
Following 
[example|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java]
 section, running query from {{testFilterPushDownCompositeBigIntRowKey1()}} 
results in following execution plan:
{code}
EXPLAIN PLAN FOR
SELECT
     CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'bigint_be') d
    ,CONVERT_FROM(BYTE_SUBSTR(row_key, 9, 8), 'bigint_be') id
    ,CONVERT_FROM(tableName.f.c, 'UTF8')
FROM hbase.`TestTableCompositeDate` tableName
WHERE
    CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'bigint_be') = cast(1409040000000 
as bigint)
;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(d=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($0, 1, 8))], 
id=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($0, 9, 8))], 
EXPR$2=[CONVERT_FROMUTF8(ITEM($1, 'c'))])
00-02        SelectionVectorRemover
00-03          Filter(condition=[=(CONVERT_FROM(BYTE_SUBSTR($0, 1, 8), 
'bigint_be'), 1409040000000)])
00-04            Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec 
[tableName=TestTableCompositeDate, startRow=null, stopRow=null, filter=null], 
columns=[`*`]]])
{code}

>From the above, Drill uses full scan and then filters out rows by key 
>substring started from 1st position.

This query executes pretty fast in test dataset provided in repo, but 
performance dramatically decreases with real use cases.

I've used 
_contrib\storage-hbase\src\test\java\org\apache\drill\hbase\TestTableGenerator.java_
 to populate test table.

Moreover, [TestHBaseFilterPushDown|TestHBaseFilterPushDown.java] uses 
[runHBaseSQLVerifyCount|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java]
 to pass the tests. It checks result set count, and not execution plan.


> Pushdown doesn't apply for HBase with substr(key) from UT
> ---------------------------------------------------------
>
>                 Key: DRILL-4666
>                 URL: https://issues.apache.org/jira/browse/DRILL-4666
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.6.0
>            Reporter: eugen yushin
>
> Following 
> [example|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java]
>  section, running query from {{testFilterPushDownCompositeBigIntRowKey1()}} 
> results in following execution plan:
> {code}
> EXPLAIN PLAN FOR
> SELECT
>      CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'bigint_be') d
>     ,CONVERT_FROM(BYTE_SUBSTR(row_key, 9, 8), 'bigint_be') id
>     ,CONVERT_FROM(tableName.f.c, 'UTF8')
> FROM hbase.`TestTableCompositeDate` tableName
> WHERE
>     CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'bigint_be') = 
> cast(1409040000000 as bigint)
> ;
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(d=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($0, 1, 8))], 
> id=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($0, 9, 8))], 
> EXPR$2=[CONVERT_FROMUTF8(ITEM($1, 'c'))])
> 00-02        SelectionVectorRemover
> 00-03          Filter(condition=[=(CONVERT_FROM(BYTE_SUBSTR($0, 1, 8), 
> 'bigint_be'), 1409040000000)])
> 00-04            Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec 
> [tableName=TestTableCompositeDate, startRow=null, stopRow=null, filter=null], 
> columns=[`*`]]])
> {code}
> From the above, Drill uses full scan and then filters out rows by key 
> substring started from 1st position.
> This query executes pretty fast in test dataset provided in repo, but 
> performance dramatically decreases with real use cases.
> I've used 
> _contrib\storage-hbase\src\test\java\org\apache\drill\hbase\TestTableGenerator.java_
>  to populate test table.
> Moreover, 
> [TestHBaseFilterPushDown|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java]
>  uses 
> [runHBaseSQLVerifyCount|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/BaseHBaseTest.java]
>  to pass the tests. It checks result set count, and not execution plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to