Will Berkeley created IMPALA-7586:
-------------------------------------
Summary: Incorrect results when querying primary = "\"" in Kudu
Key: IMPALA-7586
URL: https://issues.apache.org/jira/browse/IMPALA-7586
Project: IMPALA
Issue Type: Bug
Reporter: Will Berkeley
Attachments: impalakudu_pred_bug.profile
Version string from catalogd web ui:
{noformat}
catalogd version 3.1.0-cdh6.x-SNAPSHOT RELEASE (build
8baac7f5849b6bacb02fedeb9b3fe2b2ee9450ee)
{noformat}
A reproduction script for the impala-shell:
{noformat}
create table test(name string, primary key(name) ) stored as kudu;
insert into test values ("\"");
-- Modified 1 row(s), 0 row error(s) in 4.01s
-- row found in full table scan
select * from test;
-- Fetched 1 row(s) in 0.15s
-- row not found on = predicate (pushed to kudu)
select * from test where name="\"";
-- Fetched 0 row(s) in 0.13s
-- row found when predicate cannot be pushed to kudu
select * from test where name like "\"";
-- Fetched 1 row(s) in 0.13s
{noformat}
This was originally reported as KUDU-2575. I tried to reproduce directly
against Kudu using the python client but got the expected result.
>From the plan and profile, Impala is pushing down the predicate, but Kudu is
>not being scanned, possibly because the Kudu client short-circuits the scan as
>having no results based on the predicate Impala pushes down.
{noformat}
00:SCAN KUDU [default.test]
kudu predicates: name = '"'
mem-estimate=0B mem-reservation=0B thread-reservation=1
tuple-ids=0 row-size=15B cardinality=unavailable
in pipelines: 00(GETNEXT)
{noformat}
{noformat}
KUDU_SCAN_NODE (id=0)
- AverageScannerThreadConcurrency: 0.00 (0.0)
- InactiveTotalTime: 0ns (0)
- KuduRemoteScanTokens: 0 (0)
- MaterializeTupleTime(*): 0ns (0)
- NumScannerThreadMemUnavailable: 0 (0)
- NumScannerThreadsStarted: 1 (1)
- PeakMemoryUsage: 24.0 KiB (24576)
- PeakScannerThreadConcurrency: 1 (1)
- RowBatchBytesEnqueued: 16.0 KiB (16384)
- RowBatchQueueGetWaitTime: 0ns (0)
- RowBatchQueuePeakMemoryUsage: 0 B (0)
- RowBatchQueuePutWaitTime: 0ns (0)
- RowBatchesEnqueued: 1 (1)
- RowsRead: 0 (0)
===> - RowsReturned: 0 (0)
- RowsReturnedRate: 0 per second (0)
- ScanRangesComplete: 1 (1)
- ScannerThreadsInvoluntaryContextSwitches: 0 (0)
- ScannerThreadsTotalWallClockTime: 0ns (0)
- ScannerThreadsSysTime: 158.00us (158000)
- ScannerThreadsUserTime: 0ns (0)
- ScannerThreadsVoluntaryContextSwitches: 2 (2)
===> - TotalKuduScanRoundTrips: 0 (0)
- TotalTime: 1ms (1999972)
{noformat}
I also confirmed Kudu sees no scan from Impala for this query using the /scans
page of the tablet servers.
Full profile attached.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)