Will Berkeley created IMPALA-7586:
-------------------------------------

             Summary: Incorrect results when querying primary = "\"" in Kudu
                 Key: IMPALA-7586
                 URL: https://issues.apache.org/jira/browse/IMPALA-7586
             Project: IMPALA
          Issue Type: Bug
            Reporter: Will Berkeley
         Attachments: impalakudu_pred_bug.profile

Version string from catalogd web ui:
{noformat}
catalogd version 3.1.0-cdh6.x-SNAPSHOT RELEASE (build 
8baac7f5849b6bacb02fedeb9b3fe2b2ee9450ee)
{noformat}
A reproduction script for the impala-shell:
{noformat}
create table test(name string, primary key(name) ) stored as kudu;

insert into test values ("\"");
-- Modified 1 row(s), 0 row error(s) in 4.01s

-- row found in full table scan
select * from test;
-- Fetched 1 row(s) in 0.15s

-- row not found on = predicate (pushed to kudu)
select * from test where name="\"";
-- Fetched 0 row(s) in 0.13s

-- row found when predicate cannot be pushed to kudu
select * from test where name like "\"";
-- Fetched 1 row(s) in 0.13s
{noformat}
This was originally reported as KUDU-2575. I tried to reproduce directly 
against Kudu using the python client but got the expected result.

>From the plan and profile, Impala is pushing down the predicate, but Kudu is 
>not being scanned, possibly because the Kudu client short-circuits the scan as 
>having no results based on the predicate Impala pushes down.
{noformat}
00:SCAN KUDU [default.test]
   kudu predicates: name = '"'
   mem-estimate=0B mem-reservation=0B thread-reservation=1
   tuple-ids=0 row-size=15B cardinality=unavailable
   in pipelines: 00(GETNEXT)
{noformat}
{noformat}
KUDU_SCAN_NODE (id=0)
          - AverageScannerThreadConcurrency: 0.00 (0.0)
          - InactiveTotalTime: 0ns (0)
          - KuduRemoteScanTokens: 0 (0)
          - MaterializeTupleTime(*): 0ns (0)
          - NumScannerThreadMemUnavailable: 0 (0)
          - NumScannerThreadsStarted: 1 (1)
          - PeakMemoryUsage: 24.0 KiB (24576)
          - PeakScannerThreadConcurrency: 1 (1)
          - RowBatchBytesEnqueued: 16.0 KiB (16384)
          - RowBatchQueueGetWaitTime: 0ns (0)
          - RowBatchQueuePeakMemoryUsage: 0 B (0)
          - RowBatchQueuePutWaitTime: 0ns (0)
          - RowBatchesEnqueued: 1 (1)
          - RowsRead: 0 (0)
===>  - RowsReturned: 0 (0)
          - RowsReturnedRate: 0 per second (0)
          - ScanRangesComplete: 1 (1)
          - ScannerThreadsInvoluntaryContextSwitches: 0 (0)
          - ScannerThreadsTotalWallClockTime: 0ns (0)
            - ScannerThreadsSysTime: 158.00us (158000)
            - ScannerThreadsUserTime: 0ns (0)
          - ScannerThreadsVoluntaryContextSwitches: 2 (2)
===>  - TotalKuduScanRoundTrips: 0 (0)
          - TotalTime: 1ms (1999972)
{noformat}
I also confirmed Kudu sees no scan from Impala for this query using the /scans 
page of the tablet servers.

Full profile attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to