James Taylor created PHOENIX-2758:
-------------------------------------
Summary: Ordered GROUP BY not occurring with leading PK equality
expression
Key: PHOENIX-2758
URL: https://issues.apache.org/jira/browse/PHOENIX-2758
Project: Phoenix
Issue Type: Bug
Reporter: James Taylor
The following query:
{code}
SELECT SUM(DUP_COUNT) FROM (
SELECT COUNT(1) As DUP_COUNT
FROM DATACLOUD.DATA_ASSESSMENT_RECORD
WHERE JOURNEY_ID='07ixx000000004J' AND
DATASOURCE=0 AND MATCH_STATUS <= 1 and
ORGANIZATION_ID='07ixx000000004J'
GROUP BY MATCH_STATUS, EXTERNAL_DATASOURCE_KEY
HAVING COUNT(1) > 1);
{code}
Should use an ORDERED DISTINCT, but doesn't.
This is the DDL:
{code}
CREATE TABLE DATACLOUD.DATA_ASSESSMENT_RECORD (ORGANIZATION_ID char(15) not
null,
JOURNEY_ID char(15) not null,
DATASOURCE SMALLINT not null,
MATCH_STATUS TINYINT not null,
EXTERNAL_DATASOURCE_KEY varchar(30),
ENTITY_ID char(15) not null,
CONSTRAINT PK PRIMARY KEY (
ORGANIZATION_ID,
JOURNEY_ID,
DATASOURCE,
MATCH_STATUS,
EXTERNAL_DATASOURCE_KEY,
ENTITY_ID))
{code}
The optimization does occur if you include the PK columns in the GROUP BY like
this:
{code}
SELECT SUM(DUP_COUNT) FROM (
SELECT COUNT(1) As DUP_COUNT
FROM DATACLOUD.DATA_ASSESSMENT_RECORD
WHERE JOURNEY_ID='07ixx000000004J' AND
DATASOURCE=0 AND MATCH_STATUS <= 1 and
ORGANIZATION_ID='07ixx000000004J'
GROUP BY ORGANIZATION_ID, JOURNEY_ID, DATASOURCE, MATCH_STATUS,
EXTERNAL_DATASOURCE_KEY
HAVING COUNT(1) > 1);
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)